Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm already a casual awk enthusiast but I'm really hoping to find an opportunity to use it for a "real" software project soon. I've been reading the gawk user manual, and suffice to say, the power and features of the language is dramatically underutilized for most of the things people normally do with it (my most common use case is probably a hybrid of grep and cut)

https://www.gnu.org/software/gawk/manual/gawk.html



I wrote an IRC bot in it, one of those "paste a line of code and the bot will evaluate it and print the result" bots that you find in programming language channels. It's not a particularly big or "real" project, but it definitely fulfills the need of having a bot in that particular IRC channel.

awk is great for it because IRC (or at least the subset that the bot cares about) is relatively easy to parse, and shelling out the shell script that does the actual code evaluation and prints the result back is also fairly straightforward. Someone else used to have such a bot before but they had written it in Rust with a bajillion dependencies; if I had done that I would've had to update dependencies and redeploy it every other week. In contrast I deployed my awk version once and then basically haven't touched it in years.


This sounded interesting enough that I went and found the source code on GitHub: https://github.com/Arnavion/evalr. Starred. :-)

The following caught my attention in the bash wrapper:

  coproc GAWK {
          gawk ...
  }
  
  <&"${GAWK[0]}" openssl s_client -connect "$IRC_SERVER" -quiet >&"${GAWK[1]}"
This is a cool way to make awk talk over a socket that isn't specific to Gawk. For sockets without TLS you can replace openssl(1) with nc(1). I'll keep it in mind.

Edit: You can also use http://www.dest-unreach.org/socat/ and https://nmap.org/ncat/ with awk:

  socat "OPENSSL:$host:$port" 'EXEC:awk ...'
  
  socat "TCP:$host:$port" 'EXEC:awk ...'
  
  ncat --exec '/usr/bin/awk ...' --ssl "$host:$port"
  
  ncat -e '/usr/bin/awk ...' "$host:$port"


I recently wrote a program of slightly over 200 lines in portable AWK: https://gitlab.com/dbohdan/humsize. I wrote it for a specific operating system (NetBSD), but I have ended up using it everywhere; the portability helped with it. I can recommend AWK for small utilities that transform text in a line- and column-oriented manner and don't need libraries.

The main difficulties were making the command-line interface and testing. You can't have flags that begin with a dash in portable AWK without a shell wrapper, and I didn't want one. I settled on manually parsing key=value options, which I don't think are bad, just nonstandard. They look like this:

  humsize format=%6.1f%1s 'zero=  empty'
There is no standard way to test AWK code. For testing I wrote a shell script that checks the program's outputs with grep: https://gitlab.com/dbohdan/humsize/-/blob/122aaed8d65dc8c285.... Don't do this; your tests should give the user (you) better feedback. You may think your program doesn't need anything but a couple of trivial tests that won't ever change; it is a pain when you inevitably are proven wrong. I should have instead had a directory with reference outputs and diffed against them to see what went wrong (my own example: https://github.com/dbohdan/initool/blob/72f65d3fde245ff8660c...).

To ensure I didn't introduce portability issues, I set up testing against different awks in GitLab CI.

  image: debian:bullseye-slim

  before_script:
    - apt update
    - apt install -y busybox gawk mawk original-awk
    - ln -s "$(which busybox)" awk
    - busybox wget -O goawk.tar.gz https://github.com/benhoyt/goawk/releases/download/v1.21.0/goawk_v1.21.0_linux_amd64.tar.gz
    - tar xzvf goawk.tar.gz 

  test:
     script:
       - AWK=false ./test || true
       - AWK=./awk ./test
       - AWK=gawk ./test
       - AWK=./goawk ./test
       - AWK=mawk ./test
       - AWK=original-awk ./test
Edit: Rephrased and added a nicer shell test example.


I’m usually not a big side project guy, but I successfully used AWK to solve a IRL problem last year. It really helped solidify my understanding of the language.

The problem was that the Garmin GPS data for a bike ride I had just completed had split into multiple rides. I used AWK to stitch together the data into one file. I also did some basic linear interpolation to fill in missing data points.

The GPS data is formatted as XML and I was able to parse it fairly robustly using AWK.


How did you parse XML with AWK? I would never think of using AWK for XML data. I'd even stear clear of CSV data unless I could guarantee no in field commas or newlines.


Commas are easy if it's quoted. I just first run an awk script that uses " as the field separator and substitutes or deletes commas in odd numbered fields (as long as that's acceptable for your use case). Then with `-F,` I always check that NF is the same for all lines in the csv before proceeding.

Depending on how the xml is structured, it can be possible to just pattern match on the tags if you have something simple to do.


Yes this is it. I patterned matched on tags to create a simple state machine. Then I extracted values using splitlines on commas and quotes


I find that I tend to use AWK for text munging tasks that are too small to call a "project".




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: