Some (most?) tools that output data in columns and fit each one to the largest value in that column need to scan the whole file as a first pass just to start displaying data.
Not only is it the case with this tool, but from what I'm reading in main.rs it looks like it's also loading the whole file in memory. I was going to say that scanning the file was a deal-breaker, but if true this is much more resource-intensive.
This looks like a nice tool, but these design choices seem to limit its use to relatively small files. It could be updated to have a read-ahead buffer instead and adjust its output as new lines are discovered with values of different width, although doing this without a jarring resize could be challenging.
Could someone with better knowledge of Rust than mine confirm this?
I see the full dataset being loaded here[1] and the column widths being computed here.[2]
> these design choices seem to limit its use to relatively small files
1. As a rule-of-thumb, I have been working on functionality before optimization. That said, `tv` is really fast. It is completely false that `tv` only works for relatively small files. I just pushed a 624MB file to `tv`. It ran in 2.8 seconds. With `column` it takes 5.0 seconds. Now, I would love help from programmers smarter than me. I am sure there are a lot of optimization gains to be had in `tv`. I just wanted to make sure potential users are not misled. `tv` is performant.
> Some (most?) tools that output data in columns and fit each one to the largest value in that column need to scan the whole file as a first pass just to start displaying data.
> Not only is it the case with this tool, but from what I'm reading in main.rs it looks like it's also loading the whole file in memory.
2. `tv` reads once, but parse partly. This means that it reads the full file only to grab the number of rows. It only parses(take) the first n rows.
If the goal is to calculate the correct column width, you have to do one pass through the data before writing the first row.
If the file can be read multiple times (not a UNIX stream), you can just read the file twice.
If the file is a stream, instead of retaining the entire dataset in memory, you can write to a temporary file and re-parse it after calculating the widths.
The correct column width is calculated from the first n rows not the full file.
A stream does not work for tv because a stream does not know how many rows are in the file a priori. Displaying the dimensions of the file is a priority for `tv`. I am very happy with that trade-off. I would rather know the dimensions of a file than have a file stream of unknown dimensions.
If you did it the way he's talking about you would stream through the file to find how many rows and write the file as a temp file that you could re-parse for the actual data.
I'm not saying you should or shouldn't, but your use case doesn't bar you from using streams.
I like this idea. I don't think it would be jarring if the read-ahead buffer was a minimal number of lines, i.e. looking like distinct pages. The default could be at least the line height of the terminal, or some multiple.
There could be an option to redisplay the header row for resized "pages".
There could be a CLI switch giving the user control, i.e. make everyone happy.
It is more resource intensive, but it pushes the problem you mentioned onto tv. If tv doesn't work with embedded EOLs, then you need to fix your data or fix your tool.
> Just show me the top 5 rows. That's all most people are looking for.
Is it? I'd wager that can't be more than half its use at most. Accessing a specific section that could be at any section of the file is very common in my experience, as is truly random access. Both of these, as well as the first few rows use case, are far better served by a page system.
Not only is it the case with this tool, but from what I'm reading in main.rs it looks like it's also loading the whole file in memory. I was going to say that scanning the file was a deal-breaker, but if true this is much more resource-intensive.
This looks like a nice tool, but these design choices seem to limit its use to relatively small files. It could be updated to have a read-ahead buffer instead and adjust its output as new lines are discovered with values of different width, although doing this without a jarring resize could be challenging.
Could someone with better knowledge of Rust than mine confirm this?
I see the full dataset being loaded here[1] and the column widths being computed here.[2]
[1] https://github.com/alexhallam/tv/blob/main/src/main.rs#L183-...
[2] https://github.com/alexhallam/tv/blob/main/src/main.rs#L218-...