Note that the awk script is far more general than the typical interview question, which specifies the numbers to be iterated in order. The awk script works on any sequence of numbers.
The "series of if statements" also has to read the line, split it, and parse an integer. To behave like the AWK script it also has to catch an exception and continue when the input cannot be parsed as an integer.
Go ahead, write the Python script that behaves exactly as this AWK program does. It will likely be 4x as long, and that's because the number of different patterns and actions to take is quite low. More complex (and hence more situated and less easy-to-understand) use cases will benefit even more from AWK's defaults.
All of those mechanisms can be done in a Python script, but they add up to a lot of boilerplate and mindless yet error-prone translation to the standard library or Python looping and conditional logic.
1. Your script crashes when it is given input that does not parse as an integer. The awk script does not. In this way, the awk design favors robustness over correctness, which is a valid choice to make at times.
2. How would you modify it so it parsed a tab-delimited file and did FizzBuzz on the third column? With awk it is a simple matter of setting FS="\t" and changing $0 to $3?
3. How would you modify it so instead of being output unmodified, rows with $3 that are neither fizz nor buzz output the result of a subprocess called with the second column's contents?
Now you might say that this is all goalpost-moving, but that's the point. AWK is more flexible and less cluttered in situations where the goalposts tend to get moved, but where the basic text processing paradigm stays the same.
Can python's default be reproduced as easily in awk?
2. You'd insert field = line.split('\t') at the beginning of the loop and then refer to field[2]
3. os.popen or subprocess.run
I buy the "less cluttered" argument when the problem matches awk's defaults. I vehemently disagree with the "more flexible" argument. A problem perfectly suited to awk can easily turn to a poor fit with the addition of a single, seemingly innocuous requirement (e.g. in your subprocess example, log the standard error of your subprocess into a separate file).
at the beginning of the script. None of the other actions need to be changed; but with your implementation, all of the calls to "int" need to be changed to "intish".
I've got the following script (I stopped playing games with line breaks):
$ ./script.awk <<EOF
> thing1|0
> thing2|3
> thing3|7
> thing4|13
> EOF
FizzBuzz
Fizz
July 2018
Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31
$ cat errors.txt
cal: 13 is neither a month number (1..12) nor a name
- What does the equivalent program in Python look like?
- How many characters does it have with respect to the number of characters in the awk script? (259 with shebang).
- How many characters would need to change to split by "," instead? (1 for awk). (You can achieve this in Python, but you'll end up spending characters on a utility function.)
- How many characters would need to be added to print "INVALID: " and then the input value for lines with non-numeric values in the second column, then skip to the next line? (55 for awk)
Character adds/changes are the best proxy for "flexibility" I could think of that doesn't go far afield into static code analysis.
I love Python and don't think awk is a good solution for extremely large or complex programs; however, it seems obvious to me that it is significantly more flexible than Python in every line-oriented text-processing task. The combination of opinionated assumptions, built-in functions and automatically-set variables, and the pattern-action approach to code organization, all add up to a powerful tool that's still worth using in order to keep tasks from becoming large or complex in the first place.