Regex != regex in sed (or: replacing digits in sed)

Written by - 0 comments

Published on December 14th 2018 - Listed in Shell Linux


This is supposed to be a quick reminder to myself, the next time I run into such a problem: regular expressions are not exactly the same in sed!

On my previous article "How to manually clean up Zoneminder events" I wrote a shell script in which I wanted to remove a certain part of a path:

/var/cache/zoneminder/events/5/18/12/14/.448512/06/45/12

should become:

/var/cache/zoneminder/events/5/18/12/14/06/45/12

Simple, right? Just use sed replace and remove ".448512/" out of the string.

But see for yourself:

$ echo "/var/cache/zoneminder/events/5/18/12/14/.448512/06/45/12" | sed "s/\.\d+\///g"
/var/cache/zoneminder/events/5/18/12/14/.448512/06/45/12

The old path is still shown. Nothing was replaced. My first thought was of course that I've made a mistake in my regular expression, but on all the regex checkers online confirmed my regex was correct. For example on https://regexr.com/:

Regex match dot and digit

I was able to break it down that it must have something to do with the regular expression for the number (\d+) because simply replacing the dot character works:

$ echo "/var/cache/zoneminder/events/5/18/12/14/.448512/06/45/12" | sed "s/\.//g"
/var/cache/zoneminder/events/5/18/12/14/448512/06/45/12

And then I received the final hint from a friend: Some typical regex don't work in sed! Excerpt from sed's documentation:

*    Matches a sequence of zero or more instances of matches for the preceding regular expression, which must be an ordinary character, a special character preceded by \, a ., a grouped regexp (see below), or a bracket expression. As a GNU extension, a postfixed regular expression can also be followed by *; for example, a** is equivalent to a*. POSIX 1003.1-2001 says that * stands for itself when it appears at the start of a regular expression or subexpression, but many nonGNU implementations do not support this and portable scripts should instead use \* in these contexts.

\+   As *, but matches one or more. It is a GNU extension. 

[...]

‘[a-zA-Z0-9]’  In the C locale, this matches any ASCII letters or digits.

So first of all the plus-sign (+) must be escaped. And second to match a digit, \d doesn't work, it must be used in [0-9] style!

With these adjustments, sed now finally does the replace part:

$ echo "/var/cache/zoneminder/events/5/18/12/14/.448512/06/45/12" | sed "s/\.[0-9]\+\///g"
/var/cache/zoneminder/events/5/18/12/14/06/45/12

Dang it, I am sure that I ran into this at least once already in my Linux career. Hence this post to not lose much time the next time this happens again.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.