How to remove the first word of a line in Bash - comparing awk vs. cut vs. sed

Written by - 2 comments

Published on September 4th 2020 - Listed in Coding Bash Linux

Many paths lead to Rome. The same also applies when doing text manipulations in Bash. In this short article awk, cut and sed are compared how to remove the first word of a line.

The line itself is an output from another command - but it doesn't matter if the output comes from a file with content or from another command's stdout. As I'm currently working on fixing issue 9 of check_netio, I was looking for a generic way to remove the first word of a line:

root@linux:~# cat /proc/net/netstat
TcpExt: SyncookiesSent SyncookiesRecv SyncookiesFailed EmbryonicRsts PruneCalled RcvPruned OfoPruned OutOfWindowIcmps LockDroppedIcmps ArpFilter TW TWRecycled TWKilled PAWSActive PAWSEstab DelayedACKs DelayedACKLocked DelayedACKLost ListenOverflows ListenDrops TCPHPHits TCPPureAcks TCPHPAcks TCPRenoRecovery TCPSackRecovery TCPSACKReneging TCPSACKReorder TCPRenoReorder TCPTSReorder TCPFullUndo TCPPartialUndo TCPDSACKUndo TCPLossUndo TCPLostRetransmit TCPRenoFailures TCPSackFailures TCPLossFailures TCPFastRetrans TCPSlowStartRetrans TCPTimeouts TCPLossProbes TCPLossProbeRecovery TCPRenoRecoveryFail TCPSackRecoveryFail TCPRcvCollapsed TCPDSACKOldSent TCPDSACKOfoSent TCPDSACKRecv TCPDSACKOfoRecv TCPAbortOnData TCPAbortOnClose TCPAbortOnMemory TCPAbortOnTimeout TCPAbortOnLinger TCPAbortFailed TCPMemoryPressures TCPMemoryPressuresChrono TCPSACKDiscard TCPDSACKIgnoredOld TCPDSACKIgnoredNoUndo TCPSpuriousRTOs TCPMD5NotFound TCPMD5Unexpected TCPMD5Failure TCPSackShifted TCPSackMerged TCPSackShiftFallback TCPBacklogDrop PFMemallocDrop TCPMinTTLDrop TCPDeferAcceptDrop IPReversePathFilter TCPTimeWaitOverflow TCPReqQFullDoCookies TCPReqQFullDrop TCPRetransFail TCPRcvCoalesce TCPOFOQueue TCPOFODrop TCPOFOMerge TCPChallengeACK TCPSYNChallenge TCPFastOpenActive TCPFastOpenActiveFail TCPFastOpenPassive TCPFastOpenPassiveFail TCPFastOpenListenOverflow TCPFastOpenCookieReqd TCPFastOpenBlackhole TCPSpuriousRtxHostQueues BusyPollRxPackets TCPAutoCorking TCPFromZeroWindowAdv TCPToZeroWindowAdv TCPWantZeroWindowAdv TCPSynRetrans TCPOrigDataSent TCPHystartTrainDetect TCPHystartTrainCwnd TCPHystartDelayDetect TCPHystartDelayCwnd TCPACKSkippedSynRecv TCPACKSkippedPAWS TCPACKSkippedSeq TCPACKSkippedFinWait2 TCPACKSkippedTimeWait TCPACKSkippedChallenge TCPWinProbe TCPKeepAlive TCPMTUPFail TCPMTUPSuccess TCPWqueueTooBig
TcpExt: 0 0 0 105 0 0 0 0 0 0 2202215 0 0 0 6 211917 2726 446896 0 3 5747904 26883443 3968155 0 9122 0 704 0 0 0 0 83 333 7 0 21 2 9728 192 1702 12938 3042 0 72 0 446953 9 7961 11 518150 10 0 88 0 0 0 0 0 11 5637 52 0 0 0 22379 10366 21027 0 0 0 0 0 0 0 0 0 858733 631 0 8 0 0 0 0 0 0 0 0 0 1 0 1343155 0 0 0 2199 62739777 3573 69078 88 3242 3 0 8 0 0 0 7 0 0 0 0
IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets InBcastOctets OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts InECT0Pkts InCEPkts ReasmOverlaps
IpExt: 0 0 21 8885283 227435 0 27399435386 27248510872 812 390952368 49094631 0 0 106478999 0 462174 0 0

Note the lines start with an informational "TcpExt:" or "IpExt:". These need to be removed. Globally saying: The first word of each line needs to be removed.

Remove the first word of a line with awk

When working with awk, it's obvious that the fields can be printed out manually and leaving out the first field/word, such as:

root@linux:~# echo "first second third fourth fifth" | awk '{ print $2" "$3" "$4" "$5 }'
second third fourth fifth

But obviously this method only works if you know the exact number of words/columns in a line and you really like to type.

A better way is to use a for loop and tell awk where to start:

root@linux:~# echo "first second third fourth fifth" | awk '{for (i=2; i<NF; i++) printf $i " "; print $NF}'
second third fourth fifth

The for loop starts with the second entry (i=2) and it should continue to loop through all the fields until NF is reached. NF is an internal variable used in awk to represent the last field (word "fifth" in this case). I agree, it looks complicated, but this can be used generally across all kinds of files or output, no matter the length of a line.

Source: Stackoverflow

Remove the first word of a line with sed

sed is another powerful command which comes with more functions than anyone would think of. The problem: Using these functions is sometimes pretty "weird" and complicated - depending what one wants to achieve (well, awk is not much better in this case). However for this particular use-case to remove the first word of a line, the sed command is pretty easy:

root@linux:~# echo "first second third fourth fifth" | sed "s/^[^ ]* //"
second third fourth fifth

Basically sed is told here to use a substitution (= search and replace) function and to look for "anything but whitespace" at the beginning of the line. The "anything but" here is defined by using a special bracket expression: [^ ] . From the sed documentation:

A bracket expression is a list of characters enclosed by ‘[’ and ‘]’. It matches any single character in that list; if the first character of the list is the caret ‘^’, then it matches any character not in the list.

This means the substitution is applied on everything until the first blank space/white-space is found. And in this case this is the first word at the line beginning.

Based on this question.

Remove the first word of a line with cut

Just by hearing the command's name "cut", would let one think that this is the obvious command to use. Simply cut the first word off, right? And yes - it basically is that simple. There are two ways how to achieve this with cut:

root@linux:~# echo "first second third fourth fifth" | cut -d ' ' -f 2-
second third fourth fifth

In the above example, cut is told to use a white-space as field delimiter -d ' ' (to separate the words) and print fields 2 and later (-f 2-).

The other method is to "reverse" the cut command by saying it should print everything except the first field. This can be achieved by using the additional parameter --complement:

root@linux:~# echo "first second third fourth fifth" | cut -d ' ' -f 1 --complement
second third fourth fifth

aws vs. sed vs. cut: Who's the winner?

That's the nice part: Every command is a winner. The goal was achieved and every admin or developer should use the command one prefers. But if there's a measurement to declare a winner, it's the time factor.

On a Debian 9 (Stretch) system with a current system load of almost 0, the different commands were run alongside the time command.

ck@linux:~$ time echo "first second third fourth fifth" | awk '{for (i=2; i<NF; i++) printf $i " "; print $NF}'; \
> time echo "first second third fourth fifth" | sed "s/^[^ ]* //"; \
> time echo "first second third fourth fifth" | cut -d ' ' -f 2-; \
> time echo "first second third fourth fifth" | cut -d ' ' -f 1 --complement

second third fourth fifth

real    0m0.004s
user    0m0.000s
sys    0m0.000s
second third fourth fifth

real    0m0.005s
user    0m0.000s
sys    0m0.000s
second third fourth fifth

real    0m0.003s
user    0m0.000s
sys    0m0.000s
second third fourth fifth

real    0m0.003s
user    0m0.000s
sys    0m0.000s

The same command was run ten times with a random sleep time in between. This finally results in the following table:

 awk  sed  cut  cut reverse
 1  0.004 0.005 0.003 0.003
 2 0.004 0.005 0.003  0.002
 3 0.004 0.005 0.004 0.002
 4 0.004 0.005 0.004 0.003
 5 0.004 0.005 0.004 0.003
 6 0.004 0.005 0.004 0.003
 7 0.004 0.005 0.003 0.002
 8 0.004 0.005 0.003 0.002
 9 0.004 0.004 0.003 0.003
 10 0.004  0.005 0.004 0.003
 Avg 0.0040  0.0049  0.0035 0.0026

I'm actually quite surprised, but the winner, according to the command runtime is clearly the "reversed" cut command! sed on the other hand is clearly the slowest command.

Add a comment

Show form to leave a comment

Comments (newest first)

ck from Switzerland wrote on Sep 4th, 2020:

That is correct, cut does not do any (regex) parsing. And the program itself is also much smaller (hence quicker startup):

claudio@nas:~$ du /usr/bin/cut
44 /usr/bin/cut

claudio@nas:~$ ls -la /usr/bin/awk
lrwxrwxrwx 1 root root 21 Sep 14 2018 /usr/bin/awk -> /etc/alternatives/awk

claudio@nas:~$ file /etc/alternatives/awk
/etc/alternatives/awk: symbolic link to /usr/bin/mawk

claudio@nas:~$ du /usr/bin/mawk
120 /usr/bin/mawk

claudio@nas:~$ du /bin/sed
104 /bin/sed

However comparing the commands with the output of /proc/net/netstat does not show a larger difference (the output of netstat is still small): 0.005s for awk, 0.005s for sed, 0.003 for both cut. But a much larger file would most likely show a larger time difference, agreed.

Michael Heiniger from wrote on Sep 4th, 2020:

It's actually not surprising that the cut command wins, it does less work. It just searches one well-defined character on each line and omits anything before the first one. It does not have to apply a regex for each character.
Also the startup time of the command has to be considered. There is not much parsing in cut, while in sed and awk it first needs to parse the command you pass.
It would have been a bit more representative if you piped in a copy of your netstat than just 5 strings.