Escaping spaces in filenames

While messing around with my SVN versioning, found a useful fix for escaping special characters in bash filenames

Background - piping with scripts

Piping is powerful. It allows you to feed items from the text output of one simple unix command into the input of another, and compose them with apparently endless complexity. That is until you have spaces or apostrophes in the names and then you're screwed, since the text items are expected to be delimited by certain characters. If they are there in the actual content, it's impossible to separate the separators, so to speak.

The Objective

I wanted to do the simplest thing. I've been using svn to maintain the full version of my blog as it evolves, making sure I don't lose anything. This is especially important since the whole build is maintained as a set of related directories, and hence I'm increasingly dependent on the presence and configuration of third party tools in the structure of my editing directory.

Anyway, what tends to happen is that as I'm working on the blog, new files get created, which svn ignores when I run... svn commit -m "Here is a change" ...since they haven't been manually added yet through an 'svn add' command. Equally, files which I've deliberately removed end up lurking around in the repository and added back locally when I do an svn update to recover from a particularly foolish edit.

How to Synchronise a filesystem to an SVN repository - adding/removing to match your working version

You can synchronise a filesystem with an SVN repository properly with svn using ...

svn add `svn status | awk '/\?/' | sed 's/\? *//g' | sed 's/ /\\ /g'`

...and...

svn remove `svn status | awk '/!/' | sed 's/! *//g' | sed 's/ /\\ /g'`

...followed by...

svn commit -m "your message here"

The first two commands process the output of 'svn status' respectively

  • searching out the filenames flagged with a question mark, stripping the question mark from the filename and automatically adding them
  • searching out the filenames flagged with an exclamation mark, stripping the exclamation mark and automatically removing them

The bit which says... sed 's/ /\\ /g' ...is what solves a common bash filename escaping problem which I had, and puts backslashes in wherever there are spaces.

This could be replaced with the following sed command, which can be modified to handle any special characters.

sed 's/\([ ]\)/\\\1/g'

......probably, though I haven't tested it much. It backslash escapes any character you place in the square brackets, which is then re-inserted in place with a backslash before it.

I've inserted a few of the most common ones in the final version below...

sed 's/\([ "(){}$#\&~'"'"']\)/\\\1/g'

This extended approach lists the set of all possible escaped characters in the square brackets, including the sequence '"'"' which quotes a single quote inside a set of single quotes. Then it replaces any such character found from this set with the backslash (here escaped as \\ for sed) and the backreference which retrieves the character which was in fact matched (here shown as \1).

More Detail

In this blog, there is a file for every tag and for every blog date, so manually adding these automatically generated files (up to 6 or 7 per new post) was beyond the pale. However, there doesn't seem to be any command line approach as part of svn for just saying - look, the current filesystem is as I want it so accept that as the current version.

For me, that would imply the removal of any files which were currently missing, as well as the addition of any new files which were present.

My approach was to co-opt the output of 'svn status' into the input of 'svn add' and 'svn remove' respectively. Files which are gone (and I want to stay gone) are reported by 'svn status' with an exclamation mark. Files which are there, and unknown to svn are reported by a question mark.

So this was a nice little scripting approach, however a bunch of these file names have spaces in them (e.g. the tag page for 'curiosity collective' is called 'tag_curiosity collective.html') so spaces get in the way confusing the svn add and remove operations. The various tricks possible with the 'find' command line tool were not an option as the file names were output from svn directly.

Tagged:

scripting (6)

bash (3)

special characters

svn