Note that using ordinary write multiple times has atomicity implications - if you have multiple processes writing to STDERR, you might find their output interleaved in unfortunate ways. Using a single write with a temporary buffer or writev avoids this (provided you're not writing more than PIPE_BUF bytes total)
You are right, and there are other issues:
- I don't return the proper return value (should not return the number of bytes from the color strings)
- I don't handle the case where writev() returns a number of bytes between 0 and count (e.g. You are writing 10Mb of data, a signal arrives, and you only wrote 5Mb -- the color thing will get all confused)
The reason would be interleaving if more than one process is writing to stderr.
I was more bothered by the use of alloca(). Even on a 32-bit system with a 32-bit size_t, you can blow your stack to hell with a single call to this modified write(). Having to do a full malloc() here would be pretty lame, but as others suggested, writev() should do nicely.
The other nice property of using writev() is that you don't have to bother with the dlopen/dlsym crap. Can just call writev() directly from the overridden write() in either case.
Of course, that doesn't catch people writing to stderr using writev() directly, but I think that's ok. (The current impl doesn't catch that case anyway.)
Like bdolan said, it would have atomicity implications this way. I agree it would be lighter on memory but in some cases output might break if many processes were writing to stderr at the same time.
One problem is stdio buffering. By default the C stdio functions will flush buffers after every line for file descriptors that are connected to terminals, but only every now and then for file descriptors connected to disk files or pipes. So if a program outputs messages on both stdout and stderr (like a compiler), then after filtering one of the streams through a pipe, the messages are not interleaved properly.
it uses regular select semantics and a trivial state machine to pick the active fd, and has an XML mode - which is convenient for colourising in XHTML (if you want to record an automated process say).
#include <stdio.h>
int main() {
while (1) {
fprintf(stdout, "Hi.\n");
fprintf(stderr, "Hello.\n");
}
return 0;
}
If you run this program directly from the shell, every odd line says Hi and every even line says Hello. But if you run it through sexec (on a Linux machine at least), you get a large block of lines saying Hi, followed by a large block saying Hello, etc. Whether this is a problem or not of course depends on the use case. But to avoid it you need something more sophisticated than just pipes, e.g. the LD_PRELOAD hack in the original post.
Without looking at it's source, what 'unbuffer' does is likely allocate a PTY for the things under it. That's how the rest of Expect works after all ;)
I don't know, I kind of have mixed feelings about adding another stream. I'd probably find stdinfo and stdwarn useful. But it's a slippery slope to duplicating the features of a full logging system; you might add stdinfo, stdwarn, stdfine, stdfiner, stdfinest, stdconfig, std$custom, etc. etc.
One nice thing about having only stdout and stderr is it pushes you to ask: "Is this piece of information actually necessary to display to the user or should it just be logged to a file somewhere?" I'm not sure everyone really asks that though. Every time someone shoves a bunch of needless information to stderr "just in case" is a time I have to 2> /dev/null. (At least the useless stuff is often in stderr so I don't have to grep it out of stdout.)
Though I'm not a zsh user, my first reaction was likewise "meh, my shell can do that", and so I whipped up a pretty much identical command (only using sed). However, when it didn't behave quite as I expected, I dug a little deeper and ended up with (what was to me) a curious discovery: bash prints its prompt to stderr rather than stdout (see http://git.savannah.gnu.org/cgit/bash.git/tree/parse.y#n5013).
That's a great question. Zsh and tcsh both use stdout for their prompt, ksh and dash use stderr for the prompt itself but stdout for whatever the user is typing into the prompt. Bash, at least as it's configured on my system, seems to use stderr for the prompt and whatever the user types on it.
I really can't think of a reason that the way tcsh and zsh handle this isn't the proper way. Surely the principle of least surprise applies here.
That is a good point. I don't know if write has a limit on 'count' (I can't find anything saying that it does), but if it's simply "whatever you can fit in size_t" then you certainly want to be careful about putting that on your stack. Failed alloca's don't really give you any indication, though you might sigsegv later.
This actually doesn't work for processes using stderr on eglibc 2.13 - fprintf internally outputs via a different path that can't be hooked in this manner. On x86, you could try overwriting the system call gate pointer to a hook function, but on x86_64 you're basically stuck, unless you attach a debugger or something.
Also, the write prototype is incorrect (using int instead of size_t) and could break on 64-bit machines.
The concept may be useful for someone, but the implementation is just slow, naive and definitely not something you want to interpose your libc write function with. If the author is reading this; 'man (2) writev' for a quick code and efficiency improvement.
I am not C programmer. Actually this piece of code is my first C code since over 10 years. I might have forgotten about all good practices and stuff. But it works for me so I shared. Would you like to contact me and point slow, naive points or even better improve it and send pull request?
The slow point mainly refers to allocating and copy into a new buffer. The naive points refers to that you use 1) alloca to allocate an unknown amount of memory on the stack and 2) assume that write is atomic, never fails and will write the whole buffer (you return count). Instead you should return what write actually wrote or not. There are also a whole can of worms with regards to what happens during (signal) interrupt or if write failed to write the whole buffer. In which case the reset color code may not be sent etc etc.
Solutions to this require some fine-thinking, but for the slow stuff, you can simply replace all the allocation and copy with something like this
struct iovec iov[3] = {{STDERR_COLOR, STDERR_COLOR_SIZE},{buf, count},{COL_RESET, COL_RESET_SIZE}};
ssize_t n = 0;
do { n = writev(2, iov, 3);} while (n == -1 && errno == EINTR);
return n;
While it might not be relevant in practice, there's the possibility of data corruption if someone closes all filedescriptors (for whatever reason) and then operates on the newly generated ones. This implementation just checks for filedescriptor number 2, a more sensible implementation for sure should hook the stdio FILE or iostream mechanisms to restrict colorization only to the "standard error" case.
for(i=0;i<1024;i++) close(i); /* close a lot of fd /
/ open will yield the lowest free fds, here it's 0,1,2 /
Well, I suppose it's one step up from embedding ANSI escape sequences in your output. Various crappy Ruby tools do this and it makes Windows development SUCH a joy.
Free clue: the world is not a 1960s UNIX terminal. If you don't want to support Windows, that's fine - don't. I'll use something else or write my own. If you DO claim to "work" on Windows, read the goddamn docs and don't just blindly emit escape sequences on std(out|err) because it DOES NOT WORK.
It's not that hard to support Windows console IO, and it's well documented. I've previously written a (commercial) Java native library to provide curses-style character-at-a-time support for Java running in a terminal, using curses on UNIXalikes and a choice of Win32 console or Cygwin curses on Windows, although Cygwin curses at the time didn't correctly recognise Ctrl-Space (makes supporting Emacs keybindings a non-starter). Win32 has a considerably saner console implementation IMO.
What's bizarre about it? Clearly this library isn't the problem. What'd be even better would be a sane console standard that isn't stuck with being backwards compatible with a 7 bit everything-is-a-character protocol. The acoustic coupler is dead people, get over it! Relegate termcap / terminfo / curses / getty to the same bin we dumped Gandalf serial line multiplexors in, and move forward!
UTF-8 works quite nicely across terminals you know.
Anyway, if you want to discard the classical model terminal stack, invent something better first. You can't just declare that we should bin it without providing a viable alternative.
What has UTF-8 got to do with anything? I'm talking about 7 bit VT100 escape sequences for single keys (which in turn require millisecond-accurate timing code in e.g. curses to distinguish from just pressing the same keys.) UTF-8 is beautiful by comparison.
As for an API, see (http://msdn.microsoft.com/en-us/library/windows/desktop/ms68...). Java even uses the exact same virtual key codes for AWT KeyEvent. I suspect at some point they were just included from the equivalent Windows header but it's been a while since I looked at the Sun JVM source code.
Strictly speaking, "characters" in UTF-8 are not all 7 bits or 1 byte long, and they can be dumped across terminals. So my point is that another solution is technically possible without changing anything else.
Anyway, if the windows terminal stuff works for you, then right on. It is not adequate for many of us however. And for that matter, I agree with roel_v that the windows method is much more cumbersome for evenly moderately complicated things.
Edit: Also, delayed interpretation is a fairly common alternative to timing.
Thank you, I'm well aware of how Unicode in general and UTF-8 in particular work.
While it would be possible to extend Unicode to carry terminal control sequences (or invent yet another standard to do so) my opinion is that having an out-of-band API is better than in-band sequences - most real terminals of the VT100 vintage had BOTH. This is no longer true, and the support for in-band control sequences is a source of much incompatibility. Remember when we replaced termcap with terminfo? Oh how we laughed.
What do you think is missing from the Win32 Console API, out of interest? I ask because I've written cross platform terminal components and ended up using curses on UNIX because that is the only sane way to interact with terminals; hardly a lightweight or even very portable solution, and that's before we start talking about licensing issues.
I won't argue that out of band communication has advantages, because it does. However, if you want the windows method to have the same functionality as the traditional method, you are eventually going to have to send those things down the wire. (I don't know if it is already capable of doing this).
So really all it gets you boils down (as far as I can tell) to out of band control sequences^. At this point it hardly matters what exactly your control sequence specification is, and the windows Console API essentially becomes the new curses (while inheriting many of the problems of the traditional system, unless you can magically keep everybody in sync).
This simply is not all worth replacing everything to achieve.
And to be blunt, the advantages that out of band control provide quite likely do not outweigh the benefits afforded by in band control for most of us. For example, observe the triviality of the subject of this discussion. Unless you are on the receiving end, I would say it really does make everything easier.
PS: If you use PDCurses then you shouldn't have GPL issues. Sticking to (at least the features provided by) PDCurses will also clear up most portability issues.
^ As an example of what I mean, IIRC you can check out what ssh does when you give it the -t flag. There is some out of band communication in the traditional system after all.
I don't think we'd reinvent or replace all of curses, as it does more than raw terminal handling (e.g. logical windowing). In my mind it's a layer on top of an area of displayed text. Win32 console doesn't implement any of that either.
I'm talking about the layer that interprets the in-band escape sequences of actual type-able characters into logical functions (e.g. ESC[4D into "cursor back 4 spaces"), and the databases of long-obsolete hardware quirks that go with it. Replacing that with something closer to the actual keyboards people use (e.g. vt100 didn't HAVE F1-F4 keys, the codes for them are another non-standard area) shouldn't be so dramatic a change, especially when what we think of as remote text terminals these days are typically two fully featured computers running SSH. Perhaps a remote terminal API should just be an SSH protocol extension?
It's quite a pain though to get console coloring to work properly on Windows. I wrote a utility once that batch scripts could call to set the fore/background color (just a call to SetConsoleTextAttribute() really) but in practice it's quite a pain to write scripts that color their output properly (especially once you start mixing languages of those scripts). The terminal-emulator-that-interprets-escape-sequences-as-color-change-commands is much easier for that purpose.
Actually, no - I use Linux on the desktop at work, and Windows by choice elsewhere. However, after more than 10 years of pain I've had it with cross-platform code that isn't. I KNOW that this library isn't SUPPOSED to be cross-platform, that was my point - at least this is better than hardcoded ANSI escapes in strings!
it isn't a horrible mangling of pipes and/or file descriptors, or some custom wrapper script that you have to run before every command, or any of the other frightening ways to do this that i've seen. it's a simple library that you can load or unload easily, that works system wide, without being an impenetrable, unmaintainable mess. as far as these things go, that's clean.
Yeah, this is a tad hackish for my tastes honestly, makes to many assumptions.
The way I came up with while brainstorming a while ago (never got around to fully implementing it) is to create a PTY wrapper that you can fire off a program with.
Basically it creates a new PTY and wires it up to it's own controlling PTY, and forks of a child under the new one. You can then trivially do a lot of things transparently, like separating stdout and stderr (normally both stdout and stderr would be attached to the slave side of the PTY (/dev/tty), but if you attach them to a different fd then the select on the master/parent side can do things like color them).
EDIT: if it helps you picture it, this is basically doing userland STREAMS ;)
I'd estimate you could do this for around 250 lines of C. Maybe I'll see what I can do after dinner.
(zsh works with it, but bash doesn't currently for some stupid reason. something about what it expecting stderr to be /dev/tty or something I believe).
This is rough code, if you want to use regularly/seriously I suggest reviewing the code. I didn't pay attention much to standards compliance, I've only tested on Linux right now. I know that PTY stuff can get hairy on different *nix's, so beware of that.
EDIT: whoops, remove that '-g -lefence' from the Makefile too before you use this.
You'd have to modify every script you ever run on a machine, and modify every command you ever type in, to make this work. Any serious solution to this problem needs to be 100% transparent to the user - i.e., wrapper programs are not an option.
This is because the values of stderr/stdout are inherited, unless they are purposely overwritten (redirection or anything that allocates it's own PTY (like 'ssh'), both of which of course disables 'red-ification').
Anyway, neither of these solutions are something I would ever deploy in an "always on" setup (although it could be done in mine as well). You certainly wouldn't want to do it for your users, so they're always going to have to type something (or mess their shell's init files, which should work for mine as well).
So this is about as transparent to the user as I dare make it. Unfortunately it is not 100% transparent to the programs you are running under it since they can figure out that stderr is not /dev/tty. Nothing seems to care, with the exception of bash, which uses stderr for seemingly everything inexplicably.
If you split stdout and stderr, you can end up with the order of output changing - your output process might not get around to reading its input ends until it has data at both, and then it can't tell which is which.
with the stderred LD_PRELOAD alias
this works for me:
stderred python -c 'import os; print "Yo!"; os.write(2, "Jola\n\r")'
but this doesn't;
stderred ruby -e 'puts "Yo!";STDERR.write "Jola\n\r"'
if i do an strace the writes are going to the same file descriptors.. any ideas?
Not all of it, just stderr (so not stdout). This is useful because many things use stdout for regular output and stderr for error output (some things just use stdout for both, but that's not very nice as it disables doing things like this).
Whoops. I thought this was a tool for script authors, not the end user. Now I get it, and I think it's way cool. And I think J_Darnley doesn't like red.
I'm getting terribly confused by your confusion, so forgive me if I'm not addressing your concern..
If the program is mixing "info" level messages and "error" level messages (to use the syslog terms for them) on stderr, there really isn't much we can do about it. The understanding however is that things on stderr are probably things that the user wants to read.
Coloring both of these red shouldn't hide them anymore than coloring both of them white, except now they are both easily discernible from the 'data' coming out of stdout (which is our goal).
Possible aside: the proper way to write/read from the terminal in cases where your stdin/stdout were redirected is to just open up /dev/tty yourself. Any program that does that will not have that text colored by any of the solutions I've seen here.
I didn't mean that you can't run 32-bit binaries on 64-bit OS, what I meant was that AFAIK you can't install 32-bit deb packages together with 64-bit ones on 64-bit Ubuntu. Correct me if I'm wrong.
You can install both. The new multiarch support builds blahblah:i386 and blahblah (implicitly blahblah:amd64) debs for source packages that have it enabled. Before that, packages that wanted to provide i386 binaries on amd64 wrote their own extra build support, and added a suffix to the binary package names to avoid collisions.
Here's what I found on how to support LD_PRELOAD without a $LIB dynamic string token: