Most efficient way to batch delete S3 Files Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) Come Celebrate our 10 Year Anniversary!Deleting files from S3 subdirectoriesMost efficient way to copy between S3 accountsHow to batch edit a list of files?Most cost efficient way to backup Subversion data to S3?Moving files with batch files from one pc to a server, to a another pc - worried about disk corruptionbatch script to copy files as within SMBCIFS protocolMost bandwidth efficient setup for video sharingWhat is the most efficient way to transfer files from AWS S3 to S3?Logstash S3 input plugin re-scanning all bucket objectsBackup strategy for user uploaded filesProxy for a local mirror of S3 directories

How do I automatically answer y in bash script?

How to politely respond to generic emails requesting a PhD/job in my lab? Without wasting too much time

Why is there no army of Iron-Mans in the MCU?

New Order #5: where Fibonacci and Beatty meet at Wythoff

Why don't the Weasley twins use magic outside of school if the Trace can only find the location of spells cast?

Classification of bundles, Postnikov towers, obstruction theory, local coefficients

Does a C shift expression have unsigned type? Why would Splint warn about a right-shift?

Can a monk deflect thrown melee weapons?

Should you tell Jews they are breaking a commandment?

Stars Make Stars

Fishing simulator

Slither Like a Snake

What computer would be fastest for Mathematica Home Edition?

What is the electric potential inside a point charge?

What did Darwin mean by 'squib' here?

Complexity of many constant time steps with occasional logarithmic steps

What are the performance impacts of 'functional' Rust?

Can I throw a sword that doesn't have the Thrown property at someone?

Limit for e and 1/e

Need a suitable toxic chemical for a murder plot in my novel

How to colour the US map with Yellow, Green, Red and Blue to minimize the number of states with the colour of Green

Is there a documented rationale why the House Ways and Means chairman can demand tax info?

Is there folklore associating late breastfeeding with low intelligence and/or gullibility?

Do we know why communications with Beresheet and NASA were lost during the attempted landing of the Moon lander?



Most efficient way to batch delete S3 Files



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
Come Celebrate our 10 Year Anniversary!Deleting files from S3 subdirectoriesMost efficient way to copy between S3 accountsHow to batch edit a list of files?Most cost efficient way to backup Subversion data to S3?Moving files with batch files from one pc to a server, to a another pc - worried about disk corruptionbatch script to copy files as within SMBCIFS protocolMost bandwidth efficient setup for video sharingWhat is the most efficient way to transfer files from AWS S3 to S3?Logstash S3 input plugin re-scanning all bucket objectsBackup strategy for user uploaded filesProxy for a local mirror of S3 directories



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








9















I'd like to be able to batch delete thousands or tens of thousands of files at a time on S3. Each file would be anywhere from 1MB to 50MB. Naturally, I don't want the user (or my server) to be waiting while the files are in the process of being deleted. Hence, the questions:



  1. How does S3 handle file deletion, especially when deleting large numbers of files?

  2. Is there an efficient way to do this and make AWS do most of the work? By efficient, I mean by making the least number of requests to S3 and taking the least amount of time using the least amount of resources on my servers.









share|improve this question






























    9















    I'd like to be able to batch delete thousands or tens of thousands of files at a time on S3. Each file would be anywhere from 1MB to 50MB. Naturally, I don't want the user (or my server) to be waiting while the files are in the process of being deleted. Hence, the questions:



    1. How does S3 handle file deletion, especially when deleting large numbers of files?

    2. Is there an efficient way to do this and make AWS do most of the work? By efficient, I mean by making the least number of requests to S3 and taking the least amount of time using the least amount of resources on my servers.









    share|improve this question


























      9












      9








      9


      1






      I'd like to be able to batch delete thousands or tens of thousands of files at a time on S3. Each file would be anywhere from 1MB to 50MB. Naturally, I don't want the user (or my server) to be waiting while the files are in the process of being deleted. Hence, the questions:



      1. How does S3 handle file deletion, especially when deleting large numbers of files?

      2. Is there an efficient way to do this and make AWS do most of the work? By efficient, I mean by making the least number of requests to S3 and taking the least amount of time using the least amount of resources on my servers.









      share|improve this question
















      I'd like to be able to batch delete thousands or tens of thousands of files at a time on S3. Each file would be anywhere from 1MB to 50MB. Naturally, I don't want the user (or my server) to be waiting while the files are in the process of being deleted. Hence, the questions:



      1. How does S3 handle file deletion, especially when deleting large numbers of files?

      2. Is there an efficient way to do this and make AWS do most of the work? By efficient, I mean by making the least number of requests to S3 and taking the least amount of time using the least amount of resources on my servers.






      amazon-s3 batch-processing






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Apr 2 '15 at 4:58









      tpml7

      3301421




      3301421










      asked Apr 2 '15 at 4:06









      SudoKillSudoKill

      48114




      48114




















          5 Answers
          5






          active

          oldest

          votes


















          7














          AWS supports bulk deletion of up to 1000 objects per request using the S3 REST API and its various wrappers. This method assumes you know the S3 object keys you want to remove (that is, it's not designed to handle something like a retention policy, files that are over a certain size, etc).



          The S3 REST API can specify up to 1000 files to be deleted in a single request, which is must quicker than making individual requests. Remember, each request is an HTTP (thus TCP) request. So each request carries overhead. You just need to know the objects' keys and create an HTTP request (or use an wrapper in your language of choice). AWS provides great information on this feature and its usage. Just choose the method you're most comfortable with!



          I'm assuming your use case involves end users specifying a number of specific files to delete at once. Rather than initiating a task such as "purge all objects that refer to picture files" or "purge all files older than a certain date" (which I believe is easy to configure separately in S3).



          If so, you'll know the keys that you need to delete. It also means the user will like more real time feedback about whether their file was deleted successfully or not. References to exact keys are supposed to be very quick, since S3 was designed to scale efficiently despite handling an extremely large amount of data.



          If not, you can look into asynchronous API calls. You can read a bit about how they'd work in general from this blog post or search for how to do it in the language of your choice. This would allow the deletion request to take up its own thread, and the rest of the code can execute without making a user wait. Or, you could offload the request to a queue . . . But both of these options needlessly complicate either your code (asynchronous code can be annoying) or your environment (you'd need a service/daemon/container/server to handle the queue. So I'd avoid this scenario if possible.



          Edit: I don't have the reputation to post more than 2 links. But you can see Amazon's comments on request rate and performance here: http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html And the s3 faq comments that bulk deleiton is the way to go if possible.






          share|improve this answer






























            8














            The excruciatingly slow option is s3 rm --recursive if you actually like waiting.



            Running parallel s3 rm --recursive with differing --include patterns is slightly faster but a lot of time is still spent waiting, as each process individually fetches the entire key list in order to locally perform the --include pattern matching.



            Enter bulk deletion.



            I found I was able to get the most speed by deleting 1000 keys at a time using aws s3api delete-objects.



            Here's an example:



            cat file-of-keys | xargs -P8 -n1000 bash -c 'aws s3api delete-objects --bucket MY_BUCKET_NAME --delete "Objects=[$(printf "Key=%s," "$@")],Quiet=true"' _


            • The -P8 option on xargs controls the parallelism. It's eight in this case, meaning 8 instances of 1000 deletions at a time.

            • The -n1000 option tells xargs to bundle 1000 keys for each aws s3api delete-objects call.

            • Removing ,Quiet=true or changing it to false will spew out server responses.

            • Note: There's an easily missed _ at the end of that command line. @VladNikiforov posted an excellent commentary of what it's for in the comment so I'm going to just link to that.

            But how do you get file-of-keys?



            If you already have your list of keys, good for you. Job complete.



            If not, here's one way I guess:



            aws s3 ls "s3://MY_BUCKET_NAME/SOME_SUB_DIR" | sed -nre "s|[0-9-]+ [0-9:]+ +[0-9]+ |SOME_SUB_DIR|p" >file-of-keys





            share|improve this answer




















            • 6





              Great approach, but I found that listing the keys was the bottleneck. This is much faster: aws s3api list-objects --output text --bucket BUCKET --query 'Contents[].[Key]' | pv -l > BUCKET.keys And then removing objects (this was sufficient that going over 1 parallel process reaches the rate limits for object deletion): tail -n+0 BUCKET.keys | pv -l | grep -v -e "'" | tr 'n' '' | xargs -0 -P1 -n1000 bash -c 'aws s3api delete-objects --bucket BUCKET --delete "Objects=[$(printf "Key=%q," "$@")],Quiet=true"' _

              – SEK
              Aug 13 '18 at 18:09












            • You probably should also have stressed the importance on _ in the end :) I missed it and then it took me quite a while to understand why the first element gets skipped. The point is that bash -c passes all arguments as positional parameters, starting with $0, while "$@" only processes parameters starting with $1. So the underscore dummy is needed to fill the position of $0.

              – Vlad Nikiforov
              Oct 1 '18 at 12:42












            • @VladNikiforov Cheers, edited.

              – antak
              Oct 2 '18 at 1:30











            • One problem I've found with this approach (either from antak or Vlad) is that it's not easily resumable if there's an error. If you are deleting a lot keys (10M in my case) you may have a network error, or throttling error, that breaks this. So to improve this, I've used split -l 1000 to split my keys file into 1000 key batches. Now for each file I can issue the delete command then delete the file. If anything goes wrong, I can continue.

              – joelittlejohn
              Apr 3 at 12:32


















            3














            I was frustrated by the performance of the web console for this task. I found that the AWS CLI command does this well. For example:



            aws s3 rm --recursive s3://my-bucket-name/huge-directory-full-of-files



            For a large file hierarchy, this may take some considerable amount of time. You can set this running in a tmux or screen session and check back later.






            share|improve this answer


















            • 2





              It looks like the aws s3 rm --recursive command deletes files individually. Although faster than the web console, when deleting lots of files, it could be much faster if it deleted in bulk

              – Brandon
              Feb 22 '18 at 4:35



















            1














            A neat trick is using lifecycle rules to handle the delete for you. You can queue a rule to delete the prefix or objects that you want and Amazon will just take care of the deletion.



            https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-lifecycle.html






            share|improve this answer








            New contributor




            cam8001 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.



























              0














              Without knowing how you're managing the s3 buckets, this may or may not be particularly useful.



              The AWS CLI tools has an option called "sync" which can be particularly effective to ensure s3 has the correct objects. If you, or your users, are managing S3 from a local filesystem, you may be able to save a ton of work determining which objects need to be deleted by using the CLI tools.



              http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html






              share|improve this answer























                Your Answer








                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "2"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader:
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                ,
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );













                draft saved

                draft discarded


















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f679989%2fmost-efficient-way-to-batch-delete-s3-files%23new-answer', 'question_page');

                );

                Post as a guest















                Required, but never shown

























                5 Answers
                5






                active

                oldest

                votes








                5 Answers
                5






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                7














                AWS supports bulk deletion of up to 1000 objects per request using the S3 REST API and its various wrappers. This method assumes you know the S3 object keys you want to remove (that is, it's not designed to handle something like a retention policy, files that are over a certain size, etc).



                The S3 REST API can specify up to 1000 files to be deleted in a single request, which is must quicker than making individual requests. Remember, each request is an HTTP (thus TCP) request. So each request carries overhead. You just need to know the objects' keys and create an HTTP request (or use an wrapper in your language of choice). AWS provides great information on this feature and its usage. Just choose the method you're most comfortable with!



                I'm assuming your use case involves end users specifying a number of specific files to delete at once. Rather than initiating a task such as "purge all objects that refer to picture files" or "purge all files older than a certain date" (which I believe is easy to configure separately in S3).



                If so, you'll know the keys that you need to delete. It also means the user will like more real time feedback about whether their file was deleted successfully or not. References to exact keys are supposed to be very quick, since S3 was designed to scale efficiently despite handling an extremely large amount of data.



                If not, you can look into asynchronous API calls. You can read a bit about how they'd work in general from this blog post or search for how to do it in the language of your choice. This would allow the deletion request to take up its own thread, and the rest of the code can execute without making a user wait. Or, you could offload the request to a queue . . . But both of these options needlessly complicate either your code (asynchronous code can be annoying) or your environment (you'd need a service/daemon/container/server to handle the queue. So I'd avoid this scenario if possible.



                Edit: I don't have the reputation to post more than 2 links. But you can see Amazon's comments on request rate and performance here: http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html And the s3 faq comments that bulk deleiton is the way to go if possible.






                share|improve this answer



























                  7














                  AWS supports bulk deletion of up to 1000 objects per request using the S3 REST API and its various wrappers. This method assumes you know the S3 object keys you want to remove (that is, it's not designed to handle something like a retention policy, files that are over a certain size, etc).



                  The S3 REST API can specify up to 1000 files to be deleted in a single request, which is must quicker than making individual requests. Remember, each request is an HTTP (thus TCP) request. So each request carries overhead. You just need to know the objects' keys and create an HTTP request (or use an wrapper in your language of choice). AWS provides great information on this feature and its usage. Just choose the method you're most comfortable with!



                  I'm assuming your use case involves end users specifying a number of specific files to delete at once. Rather than initiating a task such as "purge all objects that refer to picture files" or "purge all files older than a certain date" (which I believe is easy to configure separately in S3).



                  If so, you'll know the keys that you need to delete. It also means the user will like more real time feedback about whether their file was deleted successfully or not. References to exact keys are supposed to be very quick, since S3 was designed to scale efficiently despite handling an extremely large amount of data.



                  If not, you can look into asynchronous API calls. You can read a bit about how they'd work in general from this blog post or search for how to do it in the language of your choice. This would allow the deletion request to take up its own thread, and the rest of the code can execute without making a user wait. Or, you could offload the request to a queue . . . But both of these options needlessly complicate either your code (asynchronous code can be annoying) or your environment (you'd need a service/daemon/container/server to handle the queue. So I'd avoid this scenario if possible.



                  Edit: I don't have the reputation to post more than 2 links. But you can see Amazon's comments on request rate and performance here: http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html And the s3 faq comments that bulk deleiton is the way to go if possible.






                  share|improve this answer

























                    7












                    7








                    7







                    AWS supports bulk deletion of up to 1000 objects per request using the S3 REST API and its various wrappers. This method assumes you know the S3 object keys you want to remove (that is, it's not designed to handle something like a retention policy, files that are over a certain size, etc).



                    The S3 REST API can specify up to 1000 files to be deleted in a single request, which is must quicker than making individual requests. Remember, each request is an HTTP (thus TCP) request. So each request carries overhead. You just need to know the objects' keys and create an HTTP request (or use an wrapper in your language of choice). AWS provides great information on this feature and its usage. Just choose the method you're most comfortable with!



                    I'm assuming your use case involves end users specifying a number of specific files to delete at once. Rather than initiating a task such as "purge all objects that refer to picture files" or "purge all files older than a certain date" (which I believe is easy to configure separately in S3).



                    If so, you'll know the keys that you need to delete. It also means the user will like more real time feedback about whether their file was deleted successfully or not. References to exact keys are supposed to be very quick, since S3 was designed to scale efficiently despite handling an extremely large amount of data.



                    If not, you can look into asynchronous API calls. You can read a bit about how they'd work in general from this blog post or search for how to do it in the language of your choice. This would allow the deletion request to take up its own thread, and the rest of the code can execute without making a user wait. Or, you could offload the request to a queue . . . But both of these options needlessly complicate either your code (asynchronous code can be annoying) or your environment (you'd need a service/daemon/container/server to handle the queue. So I'd avoid this scenario if possible.



                    Edit: I don't have the reputation to post more than 2 links. But you can see Amazon's comments on request rate and performance here: http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html And the s3 faq comments that bulk deleiton is the way to go if possible.






                    share|improve this answer













                    AWS supports bulk deletion of up to 1000 objects per request using the S3 REST API and its various wrappers. This method assumes you know the S3 object keys you want to remove (that is, it's not designed to handle something like a retention policy, files that are over a certain size, etc).



                    The S3 REST API can specify up to 1000 files to be deleted in a single request, which is must quicker than making individual requests. Remember, each request is an HTTP (thus TCP) request. So each request carries overhead. You just need to know the objects' keys and create an HTTP request (or use an wrapper in your language of choice). AWS provides great information on this feature and its usage. Just choose the method you're most comfortable with!



                    I'm assuming your use case involves end users specifying a number of specific files to delete at once. Rather than initiating a task such as "purge all objects that refer to picture files" or "purge all files older than a certain date" (which I believe is easy to configure separately in S3).



                    If so, you'll know the keys that you need to delete. It also means the user will like more real time feedback about whether their file was deleted successfully or not. References to exact keys are supposed to be very quick, since S3 was designed to scale efficiently despite handling an extremely large amount of data.



                    If not, you can look into asynchronous API calls. You can read a bit about how they'd work in general from this blog post or search for how to do it in the language of your choice. This would allow the deletion request to take up its own thread, and the rest of the code can execute without making a user wait. Or, you could offload the request to a queue . . . But both of these options needlessly complicate either your code (asynchronous code can be annoying) or your environment (you'd need a service/daemon/container/server to handle the queue. So I'd avoid this scenario if possible.



                    Edit: I don't have the reputation to post more than 2 links. But you can see Amazon's comments on request rate and performance here: http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html And the s3 faq comments that bulk deleiton is the way to go if possible.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Apr 2 '15 at 19:27









                    Ed D'AzzoEd D'Azzo

                    1061




                    1061























                        8














                        The excruciatingly slow option is s3 rm --recursive if you actually like waiting.



                        Running parallel s3 rm --recursive with differing --include patterns is slightly faster but a lot of time is still spent waiting, as each process individually fetches the entire key list in order to locally perform the --include pattern matching.



                        Enter bulk deletion.



                        I found I was able to get the most speed by deleting 1000 keys at a time using aws s3api delete-objects.



                        Here's an example:



                        cat file-of-keys | xargs -P8 -n1000 bash -c 'aws s3api delete-objects --bucket MY_BUCKET_NAME --delete "Objects=[$(printf "Key=%s," "$@")],Quiet=true"' _


                        • The -P8 option on xargs controls the parallelism. It's eight in this case, meaning 8 instances of 1000 deletions at a time.

                        • The -n1000 option tells xargs to bundle 1000 keys for each aws s3api delete-objects call.

                        • Removing ,Quiet=true or changing it to false will spew out server responses.

                        • Note: There's an easily missed _ at the end of that command line. @VladNikiforov posted an excellent commentary of what it's for in the comment so I'm going to just link to that.

                        But how do you get file-of-keys?



                        If you already have your list of keys, good for you. Job complete.



                        If not, here's one way I guess:



                        aws s3 ls "s3://MY_BUCKET_NAME/SOME_SUB_DIR" | sed -nre "s|[0-9-]+ [0-9:]+ +[0-9]+ |SOME_SUB_DIR|p" >file-of-keys





                        share|improve this answer




















                        • 6





                          Great approach, but I found that listing the keys was the bottleneck. This is much faster: aws s3api list-objects --output text --bucket BUCKET --query 'Contents[].[Key]' | pv -l > BUCKET.keys And then removing objects (this was sufficient that going over 1 parallel process reaches the rate limits for object deletion): tail -n+0 BUCKET.keys | pv -l | grep -v -e "'" | tr 'n' '' | xargs -0 -P1 -n1000 bash -c 'aws s3api delete-objects --bucket BUCKET --delete "Objects=[$(printf "Key=%q," "$@")],Quiet=true"' _

                          – SEK
                          Aug 13 '18 at 18:09












                        • You probably should also have stressed the importance on _ in the end :) I missed it and then it took me quite a while to understand why the first element gets skipped. The point is that bash -c passes all arguments as positional parameters, starting with $0, while "$@" only processes parameters starting with $1. So the underscore dummy is needed to fill the position of $0.

                          – Vlad Nikiforov
                          Oct 1 '18 at 12:42












                        • @VladNikiforov Cheers, edited.

                          – antak
                          Oct 2 '18 at 1:30











                        • One problem I've found with this approach (either from antak or Vlad) is that it's not easily resumable if there's an error. If you are deleting a lot keys (10M in my case) you may have a network error, or throttling error, that breaks this. So to improve this, I've used split -l 1000 to split my keys file into 1000 key batches. Now for each file I can issue the delete command then delete the file. If anything goes wrong, I can continue.

                          – joelittlejohn
                          Apr 3 at 12:32















                        8














                        The excruciatingly slow option is s3 rm --recursive if you actually like waiting.



                        Running parallel s3 rm --recursive with differing --include patterns is slightly faster but a lot of time is still spent waiting, as each process individually fetches the entire key list in order to locally perform the --include pattern matching.



                        Enter bulk deletion.



                        I found I was able to get the most speed by deleting 1000 keys at a time using aws s3api delete-objects.



                        Here's an example:



                        cat file-of-keys | xargs -P8 -n1000 bash -c 'aws s3api delete-objects --bucket MY_BUCKET_NAME --delete "Objects=[$(printf "Key=%s," "$@")],Quiet=true"' _


                        • The -P8 option on xargs controls the parallelism. It's eight in this case, meaning 8 instances of 1000 deletions at a time.

                        • The -n1000 option tells xargs to bundle 1000 keys for each aws s3api delete-objects call.

                        • Removing ,Quiet=true or changing it to false will spew out server responses.

                        • Note: There's an easily missed _ at the end of that command line. @VladNikiforov posted an excellent commentary of what it's for in the comment so I'm going to just link to that.

                        But how do you get file-of-keys?



                        If you already have your list of keys, good for you. Job complete.



                        If not, here's one way I guess:



                        aws s3 ls "s3://MY_BUCKET_NAME/SOME_SUB_DIR" | sed -nre "s|[0-9-]+ [0-9:]+ +[0-9]+ |SOME_SUB_DIR|p" >file-of-keys





                        share|improve this answer




















                        • 6





                          Great approach, but I found that listing the keys was the bottleneck. This is much faster: aws s3api list-objects --output text --bucket BUCKET --query 'Contents[].[Key]' | pv -l > BUCKET.keys And then removing objects (this was sufficient that going over 1 parallel process reaches the rate limits for object deletion): tail -n+0 BUCKET.keys | pv -l | grep -v -e "'" | tr 'n' '' | xargs -0 -P1 -n1000 bash -c 'aws s3api delete-objects --bucket BUCKET --delete "Objects=[$(printf "Key=%q," "$@")],Quiet=true"' _

                          – SEK
                          Aug 13 '18 at 18:09












                        • You probably should also have stressed the importance on _ in the end :) I missed it and then it took me quite a while to understand why the first element gets skipped. The point is that bash -c passes all arguments as positional parameters, starting with $0, while "$@" only processes parameters starting with $1. So the underscore dummy is needed to fill the position of $0.

                          – Vlad Nikiforov
                          Oct 1 '18 at 12:42












                        • @VladNikiforov Cheers, edited.

                          – antak
                          Oct 2 '18 at 1:30











                        • One problem I've found with this approach (either from antak or Vlad) is that it's not easily resumable if there's an error. If you are deleting a lot keys (10M in my case) you may have a network error, or throttling error, that breaks this. So to improve this, I've used split -l 1000 to split my keys file into 1000 key batches. Now for each file I can issue the delete command then delete the file. If anything goes wrong, I can continue.

                          – joelittlejohn
                          Apr 3 at 12:32













                        8












                        8








                        8







                        The excruciatingly slow option is s3 rm --recursive if you actually like waiting.



                        Running parallel s3 rm --recursive with differing --include patterns is slightly faster but a lot of time is still spent waiting, as each process individually fetches the entire key list in order to locally perform the --include pattern matching.



                        Enter bulk deletion.



                        I found I was able to get the most speed by deleting 1000 keys at a time using aws s3api delete-objects.



                        Here's an example:



                        cat file-of-keys | xargs -P8 -n1000 bash -c 'aws s3api delete-objects --bucket MY_BUCKET_NAME --delete "Objects=[$(printf "Key=%s," "$@")],Quiet=true"' _


                        • The -P8 option on xargs controls the parallelism. It's eight in this case, meaning 8 instances of 1000 deletions at a time.

                        • The -n1000 option tells xargs to bundle 1000 keys for each aws s3api delete-objects call.

                        • Removing ,Quiet=true or changing it to false will spew out server responses.

                        • Note: There's an easily missed _ at the end of that command line. @VladNikiforov posted an excellent commentary of what it's for in the comment so I'm going to just link to that.

                        But how do you get file-of-keys?



                        If you already have your list of keys, good for you. Job complete.



                        If not, here's one way I guess:



                        aws s3 ls "s3://MY_BUCKET_NAME/SOME_SUB_DIR" | sed -nre "s|[0-9-]+ [0-9:]+ +[0-9]+ |SOME_SUB_DIR|p" >file-of-keys





                        share|improve this answer















                        The excruciatingly slow option is s3 rm --recursive if you actually like waiting.



                        Running parallel s3 rm --recursive with differing --include patterns is slightly faster but a lot of time is still spent waiting, as each process individually fetches the entire key list in order to locally perform the --include pattern matching.



                        Enter bulk deletion.



                        I found I was able to get the most speed by deleting 1000 keys at a time using aws s3api delete-objects.



                        Here's an example:



                        cat file-of-keys | xargs -P8 -n1000 bash -c 'aws s3api delete-objects --bucket MY_BUCKET_NAME --delete "Objects=[$(printf "Key=%s," "$@")],Quiet=true"' _


                        • The -P8 option on xargs controls the parallelism. It's eight in this case, meaning 8 instances of 1000 deletions at a time.

                        • The -n1000 option tells xargs to bundle 1000 keys for each aws s3api delete-objects call.

                        • Removing ,Quiet=true or changing it to false will spew out server responses.

                        • Note: There's an easily missed _ at the end of that command line. @VladNikiforov posted an excellent commentary of what it's for in the comment so I'm going to just link to that.

                        But how do you get file-of-keys?



                        If you already have your list of keys, good for you. Job complete.



                        If not, here's one way I guess:



                        aws s3 ls "s3://MY_BUCKET_NAME/SOME_SUB_DIR" | sed -nre "s|[0-9-]+ [0-9:]+ +[0-9]+ |SOME_SUB_DIR|p" >file-of-keys






                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited Oct 2 '18 at 1:27

























                        answered Jun 22 '18 at 6:38









                        antakantak

                        18915




                        18915







                        • 6





                          Great approach, but I found that listing the keys was the bottleneck. This is much faster: aws s3api list-objects --output text --bucket BUCKET --query 'Contents[].[Key]' | pv -l > BUCKET.keys And then removing objects (this was sufficient that going over 1 parallel process reaches the rate limits for object deletion): tail -n+0 BUCKET.keys | pv -l | grep -v -e "'" | tr 'n' '' | xargs -0 -P1 -n1000 bash -c 'aws s3api delete-objects --bucket BUCKET --delete "Objects=[$(printf "Key=%q," "$@")],Quiet=true"' _

                          – SEK
                          Aug 13 '18 at 18:09












                        • You probably should also have stressed the importance on _ in the end :) I missed it and then it took me quite a while to understand why the first element gets skipped. The point is that bash -c passes all arguments as positional parameters, starting with $0, while "$@" only processes parameters starting with $1. So the underscore dummy is needed to fill the position of $0.

                          – Vlad Nikiforov
                          Oct 1 '18 at 12:42












                        • @VladNikiforov Cheers, edited.

                          – antak
                          Oct 2 '18 at 1:30











                        • One problem I've found with this approach (either from antak or Vlad) is that it's not easily resumable if there's an error. If you are deleting a lot keys (10M in my case) you may have a network error, or throttling error, that breaks this. So to improve this, I've used split -l 1000 to split my keys file into 1000 key batches. Now for each file I can issue the delete command then delete the file. If anything goes wrong, I can continue.

                          – joelittlejohn
                          Apr 3 at 12:32












                        • 6





                          Great approach, but I found that listing the keys was the bottleneck. This is much faster: aws s3api list-objects --output text --bucket BUCKET --query 'Contents[].[Key]' | pv -l > BUCKET.keys And then removing objects (this was sufficient that going over 1 parallel process reaches the rate limits for object deletion): tail -n+0 BUCKET.keys | pv -l | grep -v -e "'" | tr 'n' '' | xargs -0 -P1 -n1000 bash -c 'aws s3api delete-objects --bucket BUCKET --delete "Objects=[$(printf "Key=%q," "$@")],Quiet=true"' _

                          – SEK
                          Aug 13 '18 at 18:09












                        • You probably should also have stressed the importance on _ in the end :) I missed it and then it took me quite a while to understand why the first element gets skipped. The point is that bash -c passes all arguments as positional parameters, starting with $0, while "$@" only processes parameters starting with $1. So the underscore dummy is needed to fill the position of $0.

                          – Vlad Nikiforov
                          Oct 1 '18 at 12:42












                        • @VladNikiforov Cheers, edited.

                          – antak
                          Oct 2 '18 at 1:30











                        • One problem I've found with this approach (either from antak or Vlad) is that it's not easily resumable if there's an error. If you are deleting a lot keys (10M in my case) you may have a network error, or throttling error, that breaks this. So to improve this, I've used split -l 1000 to split my keys file into 1000 key batches. Now for each file I can issue the delete command then delete the file. If anything goes wrong, I can continue.

                          – joelittlejohn
                          Apr 3 at 12:32







                        6




                        6





                        Great approach, but I found that listing the keys was the bottleneck. This is much faster: aws s3api list-objects --output text --bucket BUCKET --query 'Contents[].[Key]' | pv -l > BUCKET.keys And then removing objects (this was sufficient that going over 1 parallel process reaches the rate limits for object deletion): tail -n+0 BUCKET.keys | pv -l | grep -v -e "'" | tr 'n' '' | xargs -0 -P1 -n1000 bash -c 'aws s3api delete-objects --bucket BUCKET --delete "Objects=[$(printf "Key=%q," "$@")],Quiet=true"' _

                        – SEK
                        Aug 13 '18 at 18:09






                        Great approach, but I found that listing the keys was the bottleneck. This is much faster: aws s3api list-objects --output text --bucket BUCKET --query 'Contents[].[Key]' | pv -l > BUCKET.keys And then removing objects (this was sufficient that going over 1 parallel process reaches the rate limits for object deletion): tail -n+0 BUCKET.keys | pv -l | grep -v -e "'" | tr 'n' '' | xargs -0 -P1 -n1000 bash -c 'aws s3api delete-objects --bucket BUCKET --delete "Objects=[$(printf "Key=%q," "$@")],Quiet=true"' _

                        – SEK
                        Aug 13 '18 at 18:09














                        You probably should also have stressed the importance on _ in the end :) I missed it and then it took me quite a while to understand why the first element gets skipped. The point is that bash -c passes all arguments as positional parameters, starting with $0, while "$@" only processes parameters starting with $1. So the underscore dummy is needed to fill the position of $0.

                        – Vlad Nikiforov
                        Oct 1 '18 at 12:42






                        You probably should also have stressed the importance on _ in the end :) I missed it and then it took me quite a while to understand why the first element gets skipped. The point is that bash -c passes all arguments as positional parameters, starting with $0, while "$@" only processes parameters starting with $1. So the underscore dummy is needed to fill the position of $0.

                        – Vlad Nikiforov
                        Oct 1 '18 at 12:42














                        @VladNikiforov Cheers, edited.

                        – antak
                        Oct 2 '18 at 1:30





                        @VladNikiforov Cheers, edited.

                        – antak
                        Oct 2 '18 at 1:30













                        One problem I've found with this approach (either from antak or Vlad) is that it's not easily resumable if there's an error. If you are deleting a lot keys (10M in my case) you may have a network error, or throttling error, that breaks this. So to improve this, I've used split -l 1000 to split my keys file into 1000 key batches. Now for each file I can issue the delete command then delete the file. If anything goes wrong, I can continue.

                        – joelittlejohn
                        Apr 3 at 12:32





                        One problem I've found with this approach (either from antak or Vlad) is that it's not easily resumable if there's an error. If you are deleting a lot keys (10M in my case) you may have a network error, or throttling error, that breaks this. So to improve this, I've used split -l 1000 to split my keys file into 1000 key batches. Now for each file I can issue the delete command then delete the file. If anything goes wrong, I can continue.

                        – joelittlejohn
                        Apr 3 at 12:32











                        3














                        I was frustrated by the performance of the web console for this task. I found that the AWS CLI command does this well. For example:



                        aws s3 rm --recursive s3://my-bucket-name/huge-directory-full-of-files



                        For a large file hierarchy, this may take some considerable amount of time. You can set this running in a tmux or screen session and check back later.






                        share|improve this answer


















                        • 2





                          It looks like the aws s3 rm --recursive command deletes files individually. Although faster than the web console, when deleting lots of files, it could be much faster if it deleted in bulk

                          – Brandon
                          Feb 22 '18 at 4:35
















                        3














                        I was frustrated by the performance of the web console for this task. I found that the AWS CLI command does this well. For example:



                        aws s3 rm --recursive s3://my-bucket-name/huge-directory-full-of-files



                        For a large file hierarchy, this may take some considerable amount of time. You can set this running in a tmux or screen session and check back later.






                        share|improve this answer


















                        • 2





                          It looks like the aws s3 rm --recursive command deletes files individually. Although faster than the web console, when deleting lots of files, it could be much faster if it deleted in bulk

                          – Brandon
                          Feb 22 '18 at 4:35














                        3












                        3








                        3







                        I was frustrated by the performance of the web console for this task. I found that the AWS CLI command does this well. For example:



                        aws s3 rm --recursive s3://my-bucket-name/huge-directory-full-of-files



                        For a large file hierarchy, this may take some considerable amount of time. You can set this running in a tmux or screen session and check back later.






                        share|improve this answer













                        I was frustrated by the performance of the web console for this task. I found that the AWS CLI command does this well. For example:



                        aws s3 rm --recursive s3://my-bucket-name/huge-directory-full-of-files



                        For a large file hierarchy, this may take some considerable amount of time. You can set this running in a tmux or screen session and check back later.







                        share|improve this answer












                        share|improve this answer



                        share|improve this answer










                        answered Aug 9 '17 at 19:01









                        dannymandannyman

                        286312




                        286312







                        • 2





                          It looks like the aws s3 rm --recursive command deletes files individually. Although faster than the web console, when deleting lots of files, it could be much faster if it deleted in bulk

                          – Brandon
                          Feb 22 '18 at 4:35













                        • 2





                          It looks like the aws s3 rm --recursive command deletes files individually. Although faster than the web console, when deleting lots of files, it could be much faster if it deleted in bulk

                          – Brandon
                          Feb 22 '18 at 4:35








                        2




                        2





                        It looks like the aws s3 rm --recursive command deletes files individually. Although faster than the web console, when deleting lots of files, it could be much faster if it deleted in bulk

                        – Brandon
                        Feb 22 '18 at 4:35






                        It looks like the aws s3 rm --recursive command deletes files individually. Although faster than the web console, when deleting lots of files, it could be much faster if it deleted in bulk

                        – Brandon
                        Feb 22 '18 at 4:35












                        1














                        A neat trick is using lifecycle rules to handle the delete for you. You can queue a rule to delete the prefix or objects that you want and Amazon will just take care of the deletion.



                        https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-lifecycle.html






                        share|improve this answer








                        New contributor




                        cam8001 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                        Check out our Code of Conduct.
























                          1














                          A neat trick is using lifecycle rules to handle the delete for you. You can queue a rule to delete the prefix or objects that you want and Amazon will just take care of the deletion.



                          https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-lifecycle.html






                          share|improve this answer








                          New contributor




                          cam8001 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                          Check out our Code of Conduct.






















                            1












                            1








                            1







                            A neat trick is using lifecycle rules to handle the delete for you. You can queue a rule to delete the prefix or objects that you want and Amazon will just take care of the deletion.



                            https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-lifecycle.html






                            share|improve this answer








                            New contributor




                            cam8001 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.










                            A neat trick is using lifecycle rules to handle the delete for you. You can queue a rule to delete the prefix or objects that you want and Amazon will just take care of the deletion.



                            https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-lifecycle.html







                            share|improve this answer








                            New contributor




                            cam8001 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.









                            share|improve this answer



                            share|improve this answer






                            New contributor




                            cam8001 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.









                            answered Apr 9 at 20:59









                            cam8001cam8001

                            1112




                            1112




                            New contributor




                            cam8001 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.





                            New contributor





                            cam8001 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.






                            cam8001 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.





















                                0














                                Without knowing how you're managing the s3 buckets, this may or may not be particularly useful.



                                The AWS CLI tools has an option called "sync" which can be particularly effective to ensure s3 has the correct objects. If you, or your users, are managing S3 from a local filesystem, you may be able to save a ton of work determining which objects need to be deleted by using the CLI tools.



                                http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html






                                share|improve this answer



























                                  0














                                  Without knowing how you're managing the s3 buckets, this may or may not be particularly useful.



                                  The AWS CLI tools has an option called "sync" which can be particularly effective to ensure s3 has the correct objects. If you, or your users, are managing S3 from a local filesystem, you may be able to save a ton of work determining which objects need to be deleted by using the CLI tools.



                                  http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html






                                  share|improve this answer

























                                    0












                                    0








                                    0







                                    Without knowing how you're managing the s3 buckets, this may or may not be particularly useful.



                                    The AWS CLI tools has an option called "sync" which can be particularly effective to ensure s3 has the correct objects. If you, or your users, are managing S3 from a local filesystem, you may be able to save a ton of work determining which objects need to be deleted by using the CLI tools.



                                    http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html






                                    share|improve this answer













                                    Without knowing how you're managing the s3 buckets, this may or may not be particularly useful.



                                    The AWS CLI tools has an option called "sync" which can be particularly effective to ensure s3 has the correct objects. If you, or your users, are managing S3 from a local filesystem, you may be able to save a ton of work determining which objects need to be deleted by using the CLI tools.



                                    http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html







                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Apr 2 '15 at 19:42









                                    Bill BBill B

                                    411




                                    411



























                                        draft saved

                                        draft discarded
















































                                        Thanks for contributing an answer to Server Fault!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid


                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.

                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f679989%2fmost-efficient-way-to-batch-delete-s3-files%23new-answer', 'question_page');

                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Club Baloncesto Breogán Índice Historia | Pavillón | Nome | O Breogán na cultura popular | Xogadores | Adestradores | Presidentes | Palmarés | Historial | Líderes | Notas | Véxase tamén | Menú de navegacióncbbreogan.galCadroGuía oficial da ACB 2009-10, páxina 201Guía oficial ACB 1992, páxina 183. Editorial DB.É de 6.500 espectadores sentados axeitándose á última normativa"Estudiantes Junior, entre as mellores canteiras"o orixinalHemeroteca El Mundo Deportivo, 16 setembro de 1970, páxina 12Historia do BreogánAlfredo Pérez, o último canoneiroHistoria C.B. BreogánHemeroteca de El Mundo DeportivoJimmy Wright, norteamericano do Breogán deixará Lugo por ameazas de morteResultados de Breogán en 1986-87Resultados de Breogán en 1990-91Ficha de Velimir Perasović en acb.comResultados de Breogán en 1994-95Breogán arrasa al Barça. "El Mundo Deportivo", 27 de setembro de 1999, páxina 58CB Breogán - FC BarcelonaA FEB invita a participar nunha nova Liga EuropeaCharlie Bell na prensa estatalMáximos anotadores 2005Tempada 2005-06 : Tódolos Xogadores da Xornada""Non quero pensar nunha man negra, mais pregúntome que está a pasar""o orixinalRaúl López, orgulloso dos xogadores, presume da boa saúde económica do BreogánJulio González confirma que cesa como presidente del BreogánHomenaxe a Lisardo GómezA tempada do rexurdimento celesteEntrevista a Lisardo GómezEl COB dinamita el Pazo para forzar el quinto (69-73)Cafés Candelas, patrocinador del CB Breogán"Suso Lázare, novo presidente do Breogán"o orixinalCafés Candelas Breogán firma el mayor triunfo de la historiaEl Breogán realizará 17 homenajes por su cincuenta aniversario"O Breogán honra ao seu fundador e primeiro presidente"o orixinalMiguel Giao recibiu a homenaxe do PazoHomenaxe aos primeiros gladiadores celestesO home que nos amosa como ver o Breo co corazónTita Franco será homenaxeada polos #50anosdeBreoJulio Vila recibirá unha homenaxe in memoriam polos #50anosdeBreo"O Breogán homenaxeará aos seus aboados máis veteráns"Pechada ovación a «Capi» Sanmartín e Ricardo «Corazón de González»Homenaxe por décadas de informaciónPaco García volve ao Pazo con motivo do 50 aniversario"Resultados y clasificaciones""O Cafés Candelas Breogán, campión da Copa Princesa""O Cafés Candelas Breogán, equipo ACB"C.B. Breogán"Proxecto social"o orixinal"Centros asociados"o orixinalFicha en imdb.comMario Camus trata la recuperación del amor en 'La vieja música', su última película"Páxina web oficial""Club Baloncesto Breogán""C. B. Breogán S.A.D."eehttp://www.fegaba.com

                                        Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070

                                        Cegueira Índice Epidemioloxía | Deficiencia visual | Tipos de cegueira | Principais causas de cegueira | Tratamento | Técnicas de adaptación e axudas | Vida dos cegos | Primeiros auxilios | Crenzas respecto das persoas cegas | Crenzas das persoas cegas | O neno deficiente visual | Aspectos psicolóxicos da cegueira | Notas | Véxase tamén | Menú de navegación54.054.154.436928256blindnessDicionario da Real Academia GalegaPortal das Palabras"International Standards: Visual Standards — Aspects and Ranges of Vision Loss with Emphasis on Population Surveys.""Visual impairment and blindness""Presentan un plan para previr a cegueira"o orixinalACCDV Associació Catalana de Cecs i Disminuïts Visuals - PMFTrachoma"Effect of gene therapy on visual function in Leber's congenital amaurosis"1844137110.1056/NEJMoa0802268Cans guía - os mellores amigos dos cegosArquivadoEscola de cans guía para cegos en Mortágua, PortugalArquivado"Tecnología para ciegos y deficientes visuales. Recopilación de recursos gratuitos en la Red""Colorino""‘COL.diesis’, escuchar los sonidos del color""COL.diesis: Transforming Colour into Melody and Implementing the Result in a Colour Sensor Device"o orixinal"Sistema de desarrollo de sinestesia color-sonido para invidentes utilizando un protocolo de audio""Enseñanza táctil - geometría y color. Juegos didácticos para niños ciegos y videntes""Sistema Constanz"L'ocupació laboral dels cecs a l'Estat espanyol està pràcticament equiparada a la de les persones amb visió, entrevista amb Pedro ZuritaONCE (Organización Nacional de Cegos de España)Prevención da cegueiraDescrición de deficiencias visuais (Disc@pnet)Braillín, un boneco atractivo para calquera neno, con ou sen discapacidade, que permite familiarizarse co sistema de escritura e lectura brailleAxudas Técnicas36838ID00897494007150-90057129528256DOID:1432HP:0000618D001766C10.597.751.941.162C97109C0155020