My own editor is array of lines in Ruby, and in now about 8 years of using it daily, and having the actual editor interact with the buffer storage via IPC to a server holding all the buffers, it's just not been a problem.
It does become a problem if you insist on trying to open files of hundred of MB of text, but my thinking is that I simply don't care to treat that as a text editing problem for my main editor, because files that size are usually something I only ever care to view or is better off manipulating with code.
If you want to be able to open and manipulate huge files, you're right, and then an editor using these kind of simple methods isn't for you. That's fine.
As it stands now, my editor holds every file I've ever opened and not explicitly closed in the last 8 years in memory constantly (currently, 5420 buffers; the buffer storage is persisted to disk every minute or so, so if I reboot and open the same file, any unsaved changes are still there unless I explicitly reload), and it's not even breaking the top 50 or so of memory use on my machine usually (those are all browser tabs...)
I'm not suggesting people shouldn't use "fancier" data structures when warranted. It's great some editors can handle huge files. Just that very naive approaches will work fine for a whole lot of use cases.
E.g. the 5420 open buffers in my editor currently are there because even the naive approach of never garbage collecting open buffers just hasn't become an issue yet - my available RAM has increased far faster than the size of the buffer storage so adding a mechanism for culling them just hasn't become a priority.
Oh by "more complex" operations I referred to multiple cursors and multi line regex searches. I've noticed some performance problems in my own editor but it's mostly because "lines" become fragmented, if you allocate all the lines with their own allocation, they might be far away from each other in memory. It's especially true when programming where lines are relatively short.
Regex searches and code highlight might introduce some hitches due to all of the seeking.
Kakoune has been my main editor for the past year (give or take) and uses an array of lines [0]. Ironically, multi-cursor and regex are some of the main features that made it attractive to me.
I just tested it out on the 100MB enwik8 file I have laying around and it does slow down significantly (took 4-5 seconds to load in the file and has a 1 second delay on changing a line). But that is not really the type of file you would be opening with your main editor.
There's a wildly out of data repo here[1] that I badly need to push updates to, and with the caveat odds are there are lots of missing pieces that'll make you struggle to get it working on your system. I wouldn't recommend it - I dumped in Github mostly mostly because why not rather than for people to actually use.
Difficulties will include e.g. helper scripts executed to do things like stripping buffers, a dependency on rofi when you try to open files, and a number of other things that works great on my machine and not so well elsewhere.
I have about 2-3 years worth of updates and cleanups I should get around to pushing to Github that does include some attempts to make it slightly easier for other people to run.
The two things I think are nice and worth picking up on is the use of DrB to get client-server, which means the editor is "multi window" simply by virtue of spawning new separate instance of itself. It's then multi-pane/frame by relying on me running a tiling wm, so splitting the buffer horizontally and vertically is "just" a matter of a tiny helper script ensuring the window opens below/to the right of the current window respectively.
But some other things, like the syntax highlighting (using Rouge) is in need of a number of bugfixes and cleanups; I keep meaning to modify the server to keep metadata about the lines and pull the syntax highlighting out so it runs in a separate process, talking directly to the server, for example.
The core data structure (array of lines) just isn't that well suited to more complex operations.
Modern CPUs can read and write memory at dozens of gigabytes per second.
Even when CPUs were 3 orders of magnitude slower, text editors using a single array were widely used. Unless you introduce some accidentally-quadratic or worse algorithm in your operations, I don't think complex datastructures are necessary in this application.
The actual latency budget would be less than a single frame to be completely non-noticable, so you are in fact limited to less than 1 GB to move per each keystroke. And each character may hold additional metadata like syntax highlight states, so 1 GB of movable memory doesn't translate to 1 GB of text either. You are still correct in that a line-based array is enough for most cases today, but I don't think it's generally true.
> The core data structure (array of lines) just isn't that well suited to more complex operations.
Just how big (and how many lines) does your file have to be before it is a problem? And what are the complex operations that make it a problem?
(Not being argumentative - I'd really like to know!)
On my own text editor (to which I lost the sources way back in 2004) I used an array of bytes, had syntax highlighting (Used single-byte start-stop codes for syntax highlighting) and used a moving "window" into the array for rendering. I never saw a latency problem back then on a Pentium Pro, even with files as large as 20MB.
I am skeptical of the piece table as used in VS Code being that much faster; right now on my 2011 desktop, a VS Code with no extra plugins has visible latency when scrolling by holding down the up/down arrow keys and a really high keyboard repeat setting. Same computer, same keyboard repeat and same file using Vim in a standard xterm/uxterm has visibly better scrolling; takes half as much time to get to the end of the file (about 10k lines).
From what I have experienced the complex data structures used here are more about maintaining responsiveness when overall system load is high and that may result slightly slower performance overall. Say you used the variable "x" a thousand times in your 10k lines of code and you want to do a find and replace on it to give it a more descriptive name like, "my_overused_variable," think about all of the memory copying that is happening if all 10k lines are in a single array. If those 10k lines are in 10k arrays which are all twice the size of the line you reduce that a fair amount. It might be slower than simpler methods when the system load is low but it will stay responsive longer.
I think vim uses a gap structure, not a single array but don't remember.
I am not a programmer, my experience could very well be due to failings elsewhere in my code and my reasoning could be hopelessly flawed, hopefully someone will correct me if I am wrong. It has also been awhile since I dug into this, the project which got me to dig into this is one of the things which got me to finally make an account on hn and one of my first submissions was Data Structures for Text Sequences.
VS Code used 40-60 bytes per line, so a file with 15 million single character lines balloons from 30 MB to 600+ MB. kilo uses 48 bytes per line on my 64-bit machine (though you can make it 40 if you move the last int with the other 3 ints instead of wasting space on padding for memory alignment), so it would have the same issue.
I have never seen a file like this in my life, let alone opened one. I'm sure they exist and people will want to open them in text editors instead of processing with sed/awk/Python, but now we're well into the 5-sigma of edge cases.
The core data structure (array of lines) just isn't that well suited to more complex operations.
Anyway here's what I built: https://github.com/lorlouis/cedit
If I were to do it again I'd use a piece table[1]. The VS code folks wrote a fantastic blog post about it some time ago[2].
[1] https://en.m.wikipedia.org/wiki/Piece_table [2] https://code.visualstudio.com/blogs/2018/03/23/text-buffer-r...