Parallel Processing With Xargs

OK. So I found a nifty trick with the venerable unix util xargs. If you aren’t familiar, xargs allows you overcome a limit in most shells, where the shell will only allow you to send a certain number of arguments to an application. Xargs can collect an arbitrary number of arguments from stdin and pass them to your application a certain number (5000 by default) at a time, over and over until all of the arguments are used up.

That alone is cool, but the fact that the application is called over and over gives us the opportunity to use xargs as sort of a poor man’s multithreading. Thankfully, the xargs implementers already thought of this, and provide us with two arguments to xargs itself, namely ‘-n x’ and ‘-P x’. The ‘-n’ argument tells xargs how many arguments to send to the application at once, and ‘-P’ tells xargs to how many concurrent applications should be launched.

For example:

1
  $ echo <big list of args> | xargs -n 1 -P 4 < do something >

tells xargs to send exactly one argument at a time, to one of four application instances. This will allow you to use all of those processors on that machine of yours instead of waiting for one to process all of the argument sequentially!

As always, rock on.

/korishev

Comments