As a point of contrast, I maintain a CI system at work, with a plain-old-HTML (almost javascript-free) log browser. Including ANSI syntax highlighting (server-rendered), timestamp tooltips, real-time updating, etc.
Chrome has no problems loading >140k lines of logs (~10mb, biggest job I could find) and scrolling/searching/selecting/copying/etc. Obviously not as snappy as smaller logs, and also not as snappy as downloading the log file to browse in IntelliJ, but it works OK-ish. It's a lot higher than the ~20k line soft limit they described in the post!
The biggest difference I could find is that Github Action's log browser generates 6 HTML elements with ~673 characters of HTML overhead per log line, whereas our own log browser generates 1 HTML element with ~45 characters of HTML element per log line (basically a single div with a timestamp attribute for a non-colored line). In that light I guess it's not surprising that browsers have more trouble with Github Actions logs given how much more work they are making the browser do to render each line of logs!
The minimal-HTML nature of our log browser was an intentional design goal. It's interesting what kind of limits you hit: you cannot use opacity (too slow), no rounded corners (too slow), one-div-per-line is faster than using spans and line wrapping (no idea why), no per-line JS logic (attach your event handlers to the parent container and wait for propagation). Kind of restrictive, but it's certainly nice not to have to implement an entire virtualized scroll window (which I have done before, and would rather not do again if possible!)
Vim would have no trouble with a 10mb log file. The only time I've really run into trouble opening a file with vim was when the file size exceeded the ram on my machine (it was an XML data dump from discogs). Even running macros and regex searches were no problem.
I am very excited to see if this helps. I use GitHub Actions constantly for very large builds with very large logs and I find their log browser infuriatingly useless... like: actively harmful :(. It punished me so much and so often and so consistently for having the gall to think that logs would be useful that I finally became demoralized and gave up trying to use their interface: I just wait until the builds complete (as they sadly refuse to do this during a build) and ask for the "raw log" so I can get the experience they seem to believe I couldn't possibly want: a ridiculously long text file downloaded into my browser, which allows me to use built-in browser features to scroll through and search it :(. I will, I guess, cautiously try to venture into using the logs again to see if they managed to fix things.
Not too long ago had to search through a truly gigantic file (I believe it contained JSON logs?). Sublime Text slowed to a crawl and froze when attempting to open it, and I didn't even bother with VS Code.
On a hunch, I tried opening it in Notepad - and it loaded quite quickly!
vim does this too. If it hangs on you when opening a very large file, don't hesitate to press Ctrl+C. It does not terminate the whole process, but aborts the syntax highlighting, and the editor becomes very snappy.
I regularly work with Actions that produce very large logs (building Yocto) and I've found trying to diagnose build failures to be extremely frustrating since this change rolled out. I want to jump to the end of the log immediately, since that's presumably where I will find my error, but instead of being able to do that in one step, I get stuck in a treadmill of endlessly lazy-loaded log lines; every time I get to the bottom of the window, it's no longer the bottom of the window anymore, regardless of if I used the mouse wheel or tried to yank the scrollbar down. Now when I have build failures, I find myself immediately just going for the raw logs, because I can jump to the end of them without putting in unreasonable effort.
Maybe this will sound dumb, but I had to deal with medium-large (not "very large") logs in the past, and I found that spending some time actually trimming down the messages was really helpful. Especially if you're using it for CI, some commands (e.g., wget, installers like apt-get, etc.) tend to produce what is essentially useless garbage (like XXXX.... progress indicators, "preparing to unpack", "unpacking", etc.). It's not something everyone bothers to do, but I spent some time making commands more quiet and actually keeping only the important parts, and I felt it paid off.
All of which is to say—have you tried going this route? I'm not familiar with Yocto, but I feel like it's worth a shot regardless of the UI. There's only so much useful information to log, and it makes it easier to read the logs too, even with a perfect UI.
I like verbose logs, because when things fail then you don't have the run the build again to get those log results. It's much easier to trim than to generate debug information from thin air.
...that's exactly why I spent some time writing concrete examples of precisely where that's not the case.
How exactly is seeing all this progress mess in your log helpful? If your network connection suddenly breaks at 158K, are you going to do anything different than if it broke at 60K? Do you really need every single line here when all you're doing is just an external download?
What I think you're not realizing is you waste so much time normally filtering through all this cruft every time you scroll through and filter logs; it's not a zero-cost addition. You're paying for it every single time, and the noise actually makes it harder to find stuff you're looking for. At best it wastes seconds of your (and your teammates') time every time; at worst it makes you (and your teammates) actually miss real problems. It actually helps productivity to filter things out, and often (like here) it doesn't even have an ongoing downside to begin with.
Note that apt in particular has one progress output which can be difficult to silence since it's actually coming from the underlying dpkg tool. Rather than passing a ton of flags into every apt invocation to deal with this, it can be helpful to drop a file like this into `etc/apt/apt.conf.d/`, especially if you know that you're in a throwaway CI environment (or a base container only used for such things) where no one will be jumping into an interactive session afterward and be confused by differing defaults:
This doesn't seem to be working for me on Ubuntu 20.04 (and according to a comment on https://askubuntu.com/questions/258219/how-do-i-make-apt-get... it seems like I'm not the only one). If, by chance, you're on Ubuntu 20.04, did you happen to figure out a way to get rid of the rest of `apt-get`'s output?
Yup, thanks for sharing this! That's why I mentioned apt as an example. It's an upfront time investment (I think I spent hours trying to figure out how to suppress apt messages) so I can understand why it's not appealing, but you do it once and then it's so much cleaner forever after.
I use these, though. I can check what time the network connection died and correlate it against any container timeouts I have, or maybe networking troubles. I can run the job again and see if it fails at the same spot it not. It’s useful to have! (Although, I wouldn’t object to it being “collapsed” as many logs will show it.)
You're making a... very tenuous argument, to say the least.
For starters, your entire premise was "you don't have the run the build again". But now that I ask what you're going to do, you tell me you're... going to run the build again. So you've basically proven my point... you're going to waste that time sitting around anyway, and you'll need to reproduce it anyway, so just kick off an extra build or two in parallel if you need to correlate offsets. It's not like the 2nd one was going to be your last build anyway.
Now re: diagnosing networks by looking at the byte offsets... so you correlate it (2 builds sufficed apparently?) and then... what exactly are you going to do now? This is GitHub Actions we're talking about, with dependencies downloaded from third-party servers. You're not managing either party's network or storage infrastructure to try to diagnose intermittent "network troubles". All you see is, for whatever reason, ubuntu/pypi/nuget/whatever isn't returning the full file. Maybe it fails in the same spot every time, maybe it doesn't. What exactly are you going to do differently based on what # byte it's failing to return? Either you can increase some timeout or switch to another server or whatever, or you can't. It's not like you're going to pull half the file and just use that. All the progress information does is satisfy idle curiosity and make you go "oh, that's funny...", and then you're left with the same options you had anyway.
And honestly, how often does this even happen that you can't afford to tweak the build and increase the verbosity or whatever? These messages are just as useful as garbage >99.9% for the time, for the entire team... you really encounter these issues often enough (and find 2 builds sufficient for fixing them) to warrant that? To say I have a hard time swallowing this is quite the understatement.
You still save a build, because (if needed) you can get results the next run rather than running two builds to figure out what went wrong. It’s not that I only have a chance to run two builds, but fewer builds saves time, especially in situations where running one takes a long time or things are falling over in such a way that it’s even hard to get a build scheduled.
As for what I would do once I have that information…for GitHub Actions not much other than report it to someone who might actually have use for that information. But for internal CI it can be useful because it can tell you things like “oh the build always fails 30 minutes in, perhaps something is timing out” or “it always fails at 68%…hmm, I can download this file when not on the corporate network. Ah, the firewall doesn’t like that packet”.
Again, I’m not saying I don’t agree with you that this is often not useful, but if you have a way to hide it away until it is necessary I would generally lean towards keeping it.
I haven't looked at what knobs are available to turn it down yet, but most of what it's logging is actually of interest to me - if it was any less verbose, it would make answering "why did this build take so much longer than the others?" much more difficult, and that's a question I have to answer with some regularity.
For diagnosing build speed, one thing I find helpful is trimming N messages down to 1-2 per command. e.g., if you spawn a process, don't emit "created process", "started process", "shutting down process", "initializing process", "terminated process"... while they all seem useful, just including one of them (like "started process") can trim things down significantly with negligible effect on the usability. I can't speak the same for your use case necessarily, but it might be worth spending a couple hours at some point looking into what you can trim. It's easy to feel the upfront time investment isn't worth it, but it really compounds on a daily basis for every single person, so it's nice if you can address it.
Well it's more that these sorts of library _can_ live on its own and not be coupled to a core business. Not to mention that in the article they talk about how other open source libraries had some pitfalls.
its perfectly reasonable to ask for it, and perfectly reasonable for them to say no :) i also wished they offered up the code, instead of dropping two measly bullet points. but they dont owe us anything
FWIW I hate how the discourse forum software not only does infinite scroll but then also hijacks Ctrl-F to go into its internal search utility. I will be very displeased if more websites follow this pattern.
To be honest, I'm not sure. Maybe hijacking Ctrl-F is the right thing to do here, but I wish that what it hijacks me to would be closer in behaviour to what the native browser find-in-page functionality is. Instead it whisks me off to a totally separate SERP with a list of post snippets totally void of their surrounding context, when typically the whole point of doing a find-in-page is not to just find a result, but to find information like "how many of these are there here, how are they clustered, and what kind of stuff is around them?"
You can click on the little gear icon in the top right of the screen that renders the logs and then "View Raw Logs", which is a raw text dump. It's perfect and fast.
I hope the UX is also considered for improvements. Currently it can only be described as "user hostile":
- only a small area of the screen is actually used for logging messages
- I'm constantly fighting the per-runstep folding views, but it's hard to describe what the actual problem is, except "it's awkward to use"
- when new messages are added to the log, the view scrolls to the end of the log, it's impossible to "catch" the view with the mouse.
- I don't know if I remember right, but searching only seems to work on the unfolded views, which is a pretty dumb idea.
etc...etc... it's really not a nice experience IMHO. From my POV (mostly C/C++ compile logs) performance is fine, but everything else sucks :)
Please have a look at any other CI system for inspiration, AppVeyor, Travis, Gitlab... I never had a problem with their log views, I guess mostly because they are much simpler.
I hate this. One of the most common things I do is search and this breaks it. I also don't want your shitty override of the Ctrl+F key.
However this may be because I am a FireFox user. I have found that for fairly simple HTML (basically just colouring and some links) such as this FireFox handles all but the largest files very well (and my build logs aren't very large in the grand scheme of things) but Chrome often locks up on much smaller files.
I hope there is a way to disable this, or at least click though to the raw log without the JS nonsense.
I love good write-ups like this! Although I would have liked a bit more detail.
UX at scale is really difficult sometimes and there are constant trade-offs and limitations to deal with. Everyone expects perfectly smooth performance with data available at an instant and it's easy to be the other side and judge than to have to actually work through the complications of solving it.
This is something frustrated me. We have log that build docker and ML tooling with all native toolchains and the log is unusable.
I maintain Jenkins and sometime we pull log that has is about 500MB and it's crashed no matter what I do. If I switch to console plain text then browser is no longer crash but very slow.
If Github action allow a plain text download option will make thing way easier.
One alternate I’ve been thinking about lately to virtualization is the HTML canvas? I believe this is how flutter works. It scrolls buttery smooth. You can stylize elements. And also restrain to a certain viewport. There may be a few ambiguities, soo might do some research to make the text look good, but just a thought
It’s an accessibility nightmare to do it all in canvas, the text isn’t even registered appropriately when you do it this way for screen readers for example
Couldn't you overlay html elements on top of the canvas elements? You don't necessarily have to use canvas text.
Each canvas element can tell you the size of each rect, so for each of the 50k log lines, you'd have an element in the canvas. When scrolling, the html elements can fade. As the rect scrolls into view, it draws the html div over itself, since each rect is uniquely associated with a line of text.
It was a thought with some of the previous work I had done with canvas.
Just to add on to this, since I can't edit, here's the Mozilla Developer Network entry that speaks to the accessibility challenges with canvas, for future reference in case someone stumbles upon this:
Chrome has no problems loading >140k lines of logs (~10mb, biggest job I could find) and scrolling/searching/selecting/copying/etc. Obviously not as snappy as smaller logs, and also not as snappy as downloading the log file to browse in IntelliJ, but it works OK-ish. It's a lot higher than the ~20k line soft limit they described in the post!
The biggest difference I could find is that Github Action's log browser generates 6 HTML elements with ~673 characters of HTML overhead per log line, whereas our own log browser generates 1 HTML element with ~45 characters of HTML element per log line (basically a single div with a timestamp attribute for a non-colored line). In that light I guess it's not surprising that browsers have more trouble with Github Actions logs given how much more work they are making the browser do to render each line of logs!
The minimal-HTML nature of our log browser was an intentional design goal. It's interesting what kind of limits you hit: you cannot use opacity (too slow), no rounded corners (too slow), one-div-per-line is faster than using spans and line wrapping (no idea why), no per-line JS logic (attach your event handlers to the parent container and wait for propagation). Kind of restrictive, but it's certainly nice not to have to implement an entire virtualized scroll window (which I have done before, and would rather not do again if possible!)