Creating Commands with xargs

By Paulus, 23 February, 2008

I have a couple of folders with a ton of compressed files in them. I'm not talking about 10 or 20. No I'm talking about a couple thousand. Instead of spending all eternity uncompressing the files, I did some research into how one would go about having the command line do all the glorious work for him.

One of the requirements was that I didn't want to decompress any Japanese files, which were indicated by '(J)'. However, if the file was American, European, and Japanese, then I did want the file decompressed. Another problem I had was that the indication of the type of file was clear. It could have either been UE, EU, JUE, UJE, or UEJ.

I wanted to keep the files separated, so I created another directory called dir2, which is the directory I'm in when I run the following command:

$ ls -1 /home/paulus/dir1 | grep -E '.*\((E|U|EU|UE|JUE|UJE|UEJ|JU|UJ)\).*\.zip' | xargs -d'\n' -L 1 -I '{}' unzip '../dir1/{}'

The first command is a very well known command to anyone who has used a *nix distribution. This is listing the files in /home/paulus/dir1. The -1 is printing a list of files with one item on each line. The output is piped to the grep command.

The grep command gets a little messy with the regular expression. The . (period) is the beginning, the * (Asterisk) is saying "anything from the beginning to the first parentheses." We have to use the back slash to escape it, otherwise it will be interpreted as a condition.

Notice how the second parentheses is not escape, that's because we're using it like we were if we were doing an if statement. From '\(' to '\)', we are using an if statement. So, return true if E or U or EU or UE or JUE or UJE or UEJ or UJ or JU are found. The '.*' is saying anything after the '()', then we have the extension, so we're only looking at zip files.

Now that the regular expression is out of the way, we can move on to the xargs command. The xargs is used to build and execute command lines from input. Some programs don't like to get data piped into it, such as the case of unzip. With the -d'\n' parameter, we are saying that each line is a separate argument and we are limiting one argument per command by the use of -L 1 parameter. -I '{}' is saying that we are going to replace {} with an argument, such as one of the file names received from stdin. We finally get to the point where we can use the unzip command as if it were on the command line. xargs will execute 'unzip ../dir1/{}', but the actual command that gets sent is unzip ../dir/file.zip, where file.zip is one of the arguments that it was passed.