sed and awk complement each other, I think. sed works on a line by line basis. a...

Xophmeister · on Oct 25, 2017

Indeed: sed is useful for making small, line-wise tweaks to text. To be honest, I use it rarely (and this is largely because its regexp flavour leaves a lot to be desired). Things like delete the header line[1S] or a simple replacement[2S]. It, like Awk, has some useful line targeting functions (e.g., print lines between two regular expressions[3S], etc.) Awk, on the other hand, is more like a finite state machine for text processing, with the notion of records and fields baked in[4]. You can do the same thing in Awk as in sed (see [*A] references), but it's often easier in sed; vice versa, some things would be impossible or very difficult to do in sed which would be easy in Awk (e.g., [4], which prints the fifth field whenever the first field is "foo"). This doesn't even get into the multiline/statewise stuff you can do in Awk, but the examples would be too big/specific to fit into this comment.

I also learned recently that GNU Awk has networking support[5]. I have no idea why!

[1S] sed '1d'

[1A] awk 'NR!=1 {print}'

[2S] sed 's/foo/bar/'

[2A] awk '{sub(/foo/, "bar")}'

[3S] sed -n '/start_regex/,/end_regex/p'

[3A] awk '/start_regex/,/end_regex/ {print}'

[4] awk '$1=="foo" {print $5}'

[5] https://www.gnu.org/software/gawk/manual/gawkinet/gawkinet.h...

vram22 · on Oct 25, 2017

[3A] awk '/start_regex/,/end_regex/ {print}'

can be simplified to:

awk '/start_regex/,/end_regex/'

because in awk, if no action is given, the default action is to print the lines (that match the pattern). And if the pattern is omitted but the action is given, it means do the action on all lines of the input.

Edited to change:

print the line (that matches the pattern)

to

print the lines (that match the pattern)

vram22 · on Oct 26, 2017

>[1S] sed '1d'

Similarly

sed 15q

will print only the 1st 15 lines of the input and then terminate. E.g.:

sed 15q file

or

some_command | sed 15q

So, when put in a shell script and then called (with filename arg or using stdin):

sed $1q

is like a specific use of the head command [1]; it prints the first n ($1) lines of the standard input or of the filename argument given - where the value of $1 comes from the first command-line argument passed to the script.

[1] In fact on earlier Unix versions I worked on (which did not have the head command (IIRC), I used to use this sed command in a script called head - similar to tail.

And I also had a script called body :) to complement head and tail, with the appropriate invocation of sed. It takes two command-line arguments ($1 and $2) and prints (only) the lines in that line number range, from the input.

fosco · on Oct 25, 2017

I did not know gawk had networking support I wonder if it could be used on network traffic on the fly. Sort of like irules on an f5. Thank you for sharing!

lo_stronzo · on Oct 26, 2017

"sed is useful for making small, line-wise tweaks to text."

Couldn't agree more!

A great example of this was using (surprised it wasn't mentioned) sed with the -i & 's///g' operators while "cleaning" hundreds (seriously) of HTML/PHP files from injected content at a shared hosting provider.

bananicorn · on Oct 25, 2017

Honestly, that makes sense - doing multiline replaces with sed isn't very convenient (I believe it's possible if you replace newlines with NULL). I guess I'll probably learn awk then, it can't be that hard with the examples from this repo^^

padthai · on Oct 25, 2017

I use awk every day because I need state (I work with text files full of sections and subsections) but I am sure that there has to be something better out there.

What is the definitive tool to process text? Perl? Haskell? Some Lisp dialect?

macintux · on Oct 25, 2017

Biased because I've used Perl for over 20 years, but yeah, that's clearly one of its core reasons to exist. Regular expressions built into the language syntax instead of as a library makes a big difference.

rlonstein · on Oct 25, 2017

> What is the definitive tool to process text? Perl? Haskell? Some Lisp dialect?

Definitive? Being snarky, the one you have already installed and are familiar with. Like most I use Awk for one-liners, Perl if I need a little more or better regexes in a one- or two-liner. For the last several years I've been using TXR[1] if it gets complex. Lately I've been doing more fiddling with JSON than text and I'm using Ruby/pry and jq[2].

[1] http://www.nongnu.org/txr/

[2] https://stedolan.github.io/jq/

kazinator · on Oct 26, 2017

Hi; I replied to your Github gist quite a while ago:

https://gist.github.com/rlonstein/90d53fdeea31d2137737

about a matter related to the hash bang line in the script.

TXR has a nice little hack (that apparently I invented) to implement the intent of "#!/usr/bin/env txr args ..." on systems where the hash bang mechanism supports only one argument after the interpreter name.

znpy · on Oct 25, 2017

Perl. It was designed for doing just that :)

Annatar · on Oct 25, 2017

Perl has an intangible "write once" property: since it allows for writing extremely sloppy code under the "there is more than one way to do it!" mantra, nobody, including the original author can debug it afterwards. Not even with the built in Perl debugger. Perl encourages horrible spaghetti code.

In the interest of fair and accurate disclosure, I earned my bread for 3.5 years debugging Perl code for a living and I've also had formal education in Perl programming at the university. I would never want to do that again.

majewsky · on Oct 25, 2017

I have spent my first 2.5 years out of college working on legacy Perl code and I cannot agree. Perl is a very nice language if you follow a coding style, and really, any language gets ugly pretty quickly if you don't. There's this adage that "some developers can write C code in any language", and it's probably similarly true that some developers can write Perl one-liners in any language. (In that legacy Perl codebase that I maintained, one of the devleopers was clearly writing Fortran code in Perl. He was doing everything with nested loops over multi-dimensional integer arrays.)

macintux · on Oct 25, 2017

I experimented a bit with writing Erlang-style code in Perl. Wasn't terribly successful; pattern matching, even with regular expressions built into the language, is a fairly tough feature to emulate.

kbenson · on Oct 25, 2017

The problem is that with regexps you're generally still doing text matching, which is inefficient and error prone. Perl's default exception mechanism allows text based errors as well, so you end up doing it there too if you use exceptions and haven't decided on and strictly used some exception objects by default (and even then you either need to deal with strings as you encounter them, such as promoting them to an exception object of some sort). Objects at least allow you to definitively match on types. Perl's (now) experimental given/when constructs and smartmatch operator would help with this, but they've been plagued with problems for a long time now (or at a minimum are not seen as production ready still).