Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

After a number of years using PowerShell, my conclusion is the opposite: text "scraping" is just better for most cases.

Normal shell usage means doing a lot of one-shot pipelines incrementally. This is just easier and faster to do when passing text around because you can look at it and not have to inspect some object for it's type and what attributes/methods it provides. Parsing the text is not the problem here (although many people think it is), reasoning about what we're trying to do is.

And the over-verbose syntax doesn't help.

I compare this to human languages. It would be tempting to create a language with minimum ambiguity and clear cut concepts, but it wouldn't be practical. I guess PowerShell as an interactive shell is somewhat like this.

For automation, PowerShell is nice. The language sits between a shell script and going the Perl/Python route, but I still prefer shell scripts for simpler things and Perl/Python for more complex tools.

Having said this, PowerShell is the only sane choice on Windows and has made my life easier by no small amount. I never enjoyed managing Windows servers with their focus on GUI tools and their terrible CLI. PowerShell changed that and even "invaded" products like SQL Server and Exchange making them also nice to manage.



I'm not entirely convinced that "plain text" is "simple." For one, utf-8 is a variable-length encoding, which can cause all sorts of subtle bugs and sometimes leads to security issues because somebody failed to parse a character correctly somewhere. On top of this, using plain text means that every program has to choose its own control characters, essentially an encoding within an encoding. This is great for readability, but it's not always easy to know how a certain program will treat certain corner cases, or even really just at all. I'm not sure what you mean by "you can just look at the text," because unlike something that encapsulates functionality alongside the data, text input into a program might do anything.


For one, utf-8 is a variable-length encoding, which can cause all sorts of subtle bugs and sometimes leads to security issues because somebody failed to parse a character correctly somewhere.

UTF-8 is nice in that extended characters, despite consisting of multiple bytes, will never contain a low-ASCII character amongst them. Unless you're dealing with byte offsets, splitting and scanning UTF-8 strings by delimiters works just like ASCII.

because unlike something that encapsulates functionality alongside the data, text input into a program might do anything.

On the other hand, that "encapsulated functionality" is even more hidden and non-obvious. Unlike a stream of bytes that you can easily inspect simply by dumping them into a file, PowerShell objects are far more opaque entities.


Is there no trivial way to print a PowerShell object? As JSON or XML or an ascii table formatted into columns? That seems like a point of friction, then, I'd agree.


By default, an object at the end of a pipeline will be printed to the screen in tabular format.

    (australis) ~\Desktop % Get-ItemProperty e
        Directory: C:\Users\Dustin\Desktop
    Mode                LastWriteTime         Length Name
    ----                -------------         ------ ----
    d-----        8/12/2016   9:06 AM                e
I believe types can specify which columns to display by default; if you want more info, there's always `Format-List`:

    (australis) ~\Desktop % Get-ItemProperty e | format-list
        Directory: C:\Users\Dustin\Desktop
    
    Name           : e
    CreationTime   : 8/11/2016 10:58:56 PM
    LastWriteTime  : 8/12/2016 9:06:13 AM
    LastAccessTime : 8/12/2016 9:06:13 AM
    Mode           : d-----
    LinkType       :
    Target         : {}
It's also possible to format any object via `ConvertTo-JSON`:

    (australis) ~\Desktop % Get-ItemProperty e | ConvertTo-JSON
    {
        "Name":  "e",
        "Parent":  {
                       "Name":  "Desktop",
                       "Parent":  {
                                      "Name":  "Dustin",
                                      "Parent":  "Users",
                                      "Exists":  true,
                                      "Root":  "C:\\",
                                      "FullName":  "C:\\Users\\Dustin",
                                      "Extension":  "",
  ...


> I believe types can specify which columns to display by default;

Indeed, the default view (table/list/custom) can be specified, and for the view the default properties can be specified.

Consider how 'ls' produces a sequence of FileSystemInfo objects (when run against a disk file system). The view specifies that the list be grouped by "parent".

On the terminal this gives a nice list a la:

        Directory: C:\Dell\Drivers\T7MFF\PhysX\files\Engine\v2.8.0


    Mode                LastWriteTime         Length Name
    ----                -------------         ------ ----
    -a----       12-12-2014     07:42         362232 NxCooking.dll
    -a----       12-12-2014     07:42        5520120 PhysXCore.dll


        Directory: C:\Dell\Drivers\T7MFF\PhysX\files\Engine\v2.8.1


    Mode                LastWriteTime         Length Name
    ----                -------------         ------ ----
    -a----       12-12-2014     07:42         363256 NxCooking.dll
    -a----       12-12-2014     07:42        5823736 PhysXCore.dll


        Directory: C:\Dell\Drivers\T7MFF\PhysX\files\Engine\v2.8.3


    Mode                LastWriteTime         Length Name
    ----                -------------         ------ ----
    -a----       12-12-2014     07:42         339192 PhysXCooking.dll
    -a----       12-12-2014     07:42         412408 PhysXCooking64.dll
    -a----       12-12-2014     07:42        5952248 PhysXCore64.dll

However, the underlying stream is still just a continous stream of FileSystemInfo's. It is the terminal display format definition that causes it formatting to break and write a directory heading whenever the next item has a different parent from the previous one.


You can certainly print them, but there's no guarantee you'll get all its contents by default, and there's still the problem of how to create those objects from some other output, e.g. in a file.

http://windowsitpro.com/powershell/powershell-objects-and-ou...

It could be said that, in some ways, unstructured streams are far more WYSIWYG, which can conceptually mean easier understanding and use.


Or it can mean a lot harder to understand and use. With types you get all of the metadata that you might not even have if it was plain text. In addition, every application chooses it's own serialization format. In the end, the only programs that compose well are those that work on text, not structured data (i.e. grep, awk, sed, etc...)


> You can certainly print them, but there's no guarantee you'll get all its contents by default, and there's still the problem of how to create those objects from some other output, e.g. in a file.

That's what CliXML is for. Export-CliXml writes objects to a file. Import-CliXml reads them back.


I think they're assuming the default view output is being used for Export-CliXML, when really it's serializing the object passed to it.


You can print them, but you can't serialise them in a way that can be later unserialised.


> You can print them, but you can't serialise them in a way that can be later unserialised.

Nope, that's wrong:

    Export-CliXml
    Import-CliXml
(https://technet.microsoft.com/en-us/library/hh849916.aspx)

For instance:

    PS> ps|Export-Clixml myprocesses.xml
    PS> Import-Clixml .\myprocesses.xml

    Handles  NPM(K)    PM(K)      WS(K)     CPU(s)     Id  SI ProcessName
    -------  ------    -----      -----     ------     --  -- -----------
        297      15     3856      13304              4024   0 aeagent
         88       7     1420       4808              3804   
    ...
CliXml (Command Line XML, IIRC) is a serialization format for PowerShell objects. It is the format used to transfer objects to/from remote machines when PS remoting is used.

Granted, the re-hydrated objects are not the original objects. Most method will not work after re-hydration as the object has been ripped from it's context. Think ProcessInfo (the objects returned from 'ps' (Get-Process)) - in it's original form it can be used to interact with the process, e.g. terminating it. In the rehydrated form it can be used only to inspect properties.

Speaking from experience, this is rarely a problem. CliXML works remarkably well.


> Normal shell usage means doing a lot of one-shot pipelines incrementally.

Sure because no-one ever takes those "one-shot" pipelines full of `cut`s and `grep`s and distributes them in real products. /s

As far as I can tell, Powershell is designed for scripts that are meant to be distributed, so they must be robust. Normal 'text scraping' Bash is shit at that.


If we are not talking about interactive use then to distribute "robust" scripts, you could use nicer and richer Perl, Python eco-systems.

For interactive use, the conciseness, the human-readability of input/output and the fault-tolerance to unrelated changes in the input/output formats provided by Unix shells are also preferable.


The motivation for PowerShell was to have something good at both. Why switch contexts when moving from interactive use to coded automation if you don't have to?


Because the requirements change depending on which one you're doing: quick shell automation isn't the same as the kind of automation you'd use perl/python for.


And yet most of the time I see bash being used for the same kind of automation that you think is good for perl/python.

And not because bash is good, but because it is what people know (since they use it interactively) and because it is installed everywhere.

So if someone had a tool that was installed everywhere, and used interactively, that could also be used to create more robust automation tasks, that seems like a win to me.


It depends on how complex the automation is. I'd no doubt use a pipeline in places where you'd use perl/python.

As for interactive use and robust automation, Bash isn't as bad as you'd think. The reason I'd go to python is because of script complexity, not lack of robustness.


> text "scraping" is just better for most cases.

Unix pipes handle bytes, not just text. For instance, copy a hard disk partition to a remote hard disk via ssh:

  (dd if=/dev/hda1) | (ssh root@host dd of=/dev/sda1)
The KISS principle ("Keep it simple, Stupid") is the best way in many cases. In Unix you can quickly enter a pipe without special coding which does also non-trivial stuff. For instance,

  find -name "*.xls" -exec echo -n '"{}" ' \; | xargs xls2csv | grep "Miller" | sort 
gets a sorted list of all entries in all Excel files which contain the name "Miller", no matter how deeply the files are located in the directories. Can you do this in Powershell quickly? I don't know, I am actually curious.

Objects in pipes are convenient and powerful. However, for most applications of pipes they are probably overkill. If things get tough you can simply use files rather than objects.

I would not be surprised if many Windows users will prefer bash pipes from the Ubuntu subsystem in Windows 10 rather than Powershell because it is much more handy for most pipe applications.


> Can you do this in Powershell quickly? I don't know, I am actually curious.

You can basically do the same thing with Powershell. I don't know of a 'built-in' module to handle XLS->CSV conversion, so you need to bring one in:

  Install-Module ImportExcel
Then:

  ls *.xlsx -r | %{ Import-excel $_ } | ? { $_.Name -eq "Miller" } | sort
That's my naive, Powershell noob approach anyway.

There's another option however, where you can leverage Excel itself (granted, this is likely to be a Windows only approach):

  $excel = New-Object -com excel.application
And you can now open XLS/XLSX files and operate on them as an actual excel document (including iterating through workbooks/ worksheets, etc.). It's all just objects.


Thanks for pointing this out. However, your example is not equivalent since "ls" searches only in the current directory. Nevertheless it is good to know that it's basically possible since it helps a lot to manage mixed Windows/Linux networks.


Not true. The -r flag I specified is for "recurse", will search current and all sub-directories.


A better way to do that find command is:

    find -name '*.xls' -print0 | xargs -0 xls2csv


> I compare this to human languages. It would be tempting to create a language with minimum ambiguity and clear cut concepts, but it wouldn't be practical.

Lojban[1] is usable in practice (though that may be because it still allows you to be ambiguous when desired).

[1] http://lojban.org


> And the over-verbose syntax doesn't help

This is basically the reason I have not been willing to put any time even trying to learn ps. I mean, trying to translate

unzip file.zip folder

To ps was a nigthmare. Like:

[System.IO.Compression.ZipFile]::ExtractToDirectory($zipfile, $outpath)

I don't _want_ to learn to type stuff like that for simple shell scripts.


But that's picking an example where

a. You can still use the native binary just as well. Especially in this case.

b. There was no PowerShell cmdlet for it at the time. There also isn't one for executing Python code, you still have to call python. Or, if you're so inclined, load up IronPython's assembly, and use an overly verbose .NET method call ...

c. There now is a cmdlet for doing so: Expand-ZipFile.

I'm actually curious as to those complaints: Do you never write a function in bash to abstract away something? A function in other languages? Is everything just a series of crystal-clear one-liners?

Shell scripts in Unix-likes tend to glue together a bunch of other programs and (usually) not do much programming in the shell's language. You can write PowerShell just the same, actually. It is a shell, after all.


If you're used to cmd or bash, the jump in verbosity is very noticeable and definitely gets in the way.

It might be a cultural thing, because I'm no fan of C# and its typical style either; no surprise then, that I find PS syntax exceedingly verbose too. It just feels excessively bureaucratic and awkward to have to write so much. On the other hand, bash, awk, sed, and all the typical Unix commands and its associated ecosystem seem like they "get out of your way" far more effectively.

c. There now is a cmdlet for doing so: Expand-ZipFile.

That example already shows the verbosity increase clearly - why is it "Expand-ZipFile", and not "ZipFile-Expand", "Expand-Zip-Format-Archive", or something else? In contrast, "unzip" is short and easy to remember. The fact that a native binary might exist for a given task is irrelevant to the observation that the shell's language is itself more verbose.

I'm actually curious as to those complaints: Do you never write a function in bash to abstract away something? A function in other languages? Is everything just a series of crystal-clear one-liners?

Abstraction helps reduce code duplication but is not useful when each line of the script is quite different, and in that case PS remains more verbose. Ultimately, the overhead is still higher.


> Why is it "Expand-ZipFile", and not "ZipFile-Expand", "Expand-Zip-Format-Archive", or something else? In contrast, "unzip" is short and easy to remember.

Iff you know it already. PowerShell is built around a few conventions. One is that all commands share the Verb-Noun pattern. The verb always comes first. Then there are a bunch of common verbs that all share the same basic meaning, regardless of context. Expand is such a verb that is the opposite of the Compress verb. You may not like then choice of verb but there are always going to be names you didn't choose, so that's probably a petty point. In the end, PowerShell makes it easy to discover commands you may vaguely know exist. I'd argue that also helps remembering them once you know them. Just from knowing unzip you wouldn't be able to guess the command to extract a GZIP file, for example.

___________

P.S.: I have to retract part of my original post here. Expand-ZipFile does not exist natively in PowerShell. I stumbled across http://ss64.com/ps/zip.html and while that site does have documentation for all built-in cmdlets, I didn't read too closely and this was actually documentation for a wrapper around the .NET framework's own functionality. This does not change the discussion about the name and discoverability, though, except that New-ZipFile should probably use a different verb. Actually, the PowerShell cmdlets for handling ZIP files are Compress-Archive and Expand-Archive, exhibiting the properly mirrored verbs.


That was picking an example from my previous try to work with ps. I was not the admin of the server so no installation of extra programs. Or upgrading ps to the newest version.

The shell scripting I have needed has been typically very simple. Like schedule to copy this file from that server and run this command.(what was precisely all that I was trying to achieve the last time, only the file was compressed. Would have done with .bat, but that was even more difficult) As you say, using it as a glue and build the complexity to the programs behind the shell.


Use this.

Expand-Archive file.zip


This is all a big part of the reason why I personally advocate using a standardized text-based object serialization format (like YAML) for these sorts of things. In particular:

* Still text based and human-readable, which helps for debugging/troubleshooting * Still text-based and machine-readable, so it's inherently cross-platform (assuming that all said platforms agree on their text encoding) * Still less troublesome than piping arbitrary text through `grep` or `sed` or `awk` or what have you * Still provides the "everything is an object" benefit that's lost with arbitrary text streams * Still orientable around streams of data, at least for YAML (by using the document delimiter)


Passing CSV around can be a useful middle ground, if you're using utilities that all do proper escaping and allow newlines in strings etc., such as csvkit.


What is proper escaping for CSV?


The problem is that not all tools agree on this one, but: lines are separated by newlines, items are separated by commas. An item can be quoted (and this must be done if it contains a newline, comma, or double quote) by surrounding them with double quotes. Inside such quoted items, double quotes can be escaped by doubling them.


RFC4180.


DSV is the default format for many unix utils, and AWK and cut parse it easily.


Yes, of course; that is one end of the "middle ground" I described. The problem is that in Unix, there is no common convention for how to escape the delimiter, so you cannot pass arbitrary strings. CSV can do this, with the proper tooling. Being able to pass around arbitrary strings is quite valuable, and although it is not quite the same as passing objects around, it has the advantage of still being human-readable text. That is why I called it a "useful middle ground" between Powershell and the delimiter-separated fields of Unix tools. However, you do need special tools such as csvkit, as Unix tools like awk, sort and cut cannot properly parse it.


Oh, you can pass arbitrary strings, you just can't pass arbitrary strings as part of compound datastructures. The closest you can get is using null or some other weird character to separate lines ($RS='\0' in awk), and something similar for field separation (like -0 in find, which is $FS='\0' in awk). But \0 is only one special character, so you need another one.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: