I was wrong about spreadsheets (2017)

halfeatenpie · on July 12, 2019

I work in critical infrastructure planning. My organization builds software in R, Python, and other programming languages customized for these major organizations.

So many critical infrastructures, billions of dollars in planning, and just systems are built out of Excel. It's amazing. You'd assume something that services millions of people a day would have some more sophisticated and customized solution, but you're wrong.

The reason is because most people know how to use Excel. Most people know how to handle it, use it, modify it, build up from it. You don't need to get a regular support contract from a company for your custom-built spreadsheets. If something critical in that "Excel software stack" breaks (aka Microsoft Excel), most of the time you just need to reset Excel. It's amazing. You don't need a new server, you don't need support contracts, it's just like riding a bike.

In my opinion, the limitation of Excel is actually when it comes to big-data analytics. The way Excel handles large quantities of data is slow (due to the nature of it's software structure). That's where we come in and build out these models and systems. However, in the end, our data outputs will be fed back into Excel, because that's what most people are used to.

I have deep respect for Excel. After all, Excel empowers so many users who don't know how to code to provide amazing plots and perform major calculations with ease.

ozim · on July 12, 2019

Yeah it works and you don't need support contracts and then:

- Fidelity's "Minus Sign Mistake": loss of $1.3 billion - TransAlta "Clerical Error": loss of $24 million - Fannie Mae "Honest mistake": loss of $1.3 billion

Then you get employee turn over where new employees don't get "arcane" knowledge passed down by people who left and took their spreadsheet foo with them.

Excel does not have "access control", "auditing", "change tracking". Just getting work done is not enough.

mbesto · on July 12, 2019

> - Fidelity's "Minus Sign Mistake": loss of $1.3 billion - TransAlta "Clerical Error": loss of $24 million - Fannie Mae "Honest mistake": loss of $1.3 billion

Weigh that against <insert any company> makes $X Billion due to correct usage of Excel multiplied thousands and thousands of times over. Survivorship bias at it's finest.

Excel isn't perfect and has notable downsides, but work gets done all of the time on them for the aforementioned reasons.

swiley · on July 12, 2019

I’m not convinced there’s a such thing as “correct usage of excel.”

What a lot of companies do is just scare entry level employees into being very careful. My friend’s girlfriend works at a place like this and people just accept all the problems with manually editing large spreadsheets because “that’s just life.”

Anything in excel is a hack and most of its users don’t know any better.

behringer · on July 12, 2019

What's your solution? That the manager who wanted some report comes to you and begs you to do it and in 6 months it's finally delivered and all wrong?

jvanderbot · on July 12, 2019

The sweet spot where I work is online forms that pre-populate excel playgrounds and allow download. Then, you can tweak till your heart's content.

zelphirkalt · on July 12, 2019

The manager goes to the people, who really know how to make a report (maybe that is "accounting"?) and asks them to make the report taking into account some single report specific things that they talk about. Then those people, who know how to do it deliver. They might have someone in their team, who knows how to work with dataframes, which would come in handy. Those people store the program written to generate the report from the data and are ready to modify when more is requested. Instead of doing code in Excel, they simply write their code in whatever language they use to process spreadsheets or other data sources.

If they have knowledgeable people there, it will take no longer than coding something up in Excel and will be more reliable.

kungfugz · on July 13, 2019

I guess no one has used Power BI for reporting... Link the data source, save the view that "Manager A" wants every week or month, setup on premesis gateway, and your new reports are there before your are asked for them again. Always up to date. Then share your dashboard. Also, you guys should be ashamed of yourselves. "not sure who does the reports, maybe finance?" Eceryone should know how to make their own reports.

michaelmrose · on July 12, 2019

If you regularly produce reports one would suppose that you would have an existing project template and know how that would allow a reasonable individual to deliver a result that isn't all wrong in a reasonable length of time not THAT dissimilar from the time to throw up a spreadsheet.

This also wouldn't silo the data in the users laptop, subject it to being destroyed because the user dropped it, and limit frequency of updates and access to how fast the user can respond to emails.

behringer · on July 12, 2019

And meanwhile the manager that broke the rules, connected their personal copy of excel straight into the database and made the company 8 figures in profit over the course of the last year just outdid you and your team of report programmers.

zelphirkalt · on July 12, 2019

And meanwhile an actual data scientist looked at the data and outdid the manager by a factor of 10, by creating a more meaningful report, that is based on an actual statistically sound basis, preventing a disaster.

See, we can all conjure up imaginary people. :)

michaelmrose · on July 12, 2019

Logically most of the profit making activities may be enabled by tech it usually isn't created by tech. The managers rule breaking activities are probably unrelated to his success.

behringer · on July 12, 2019

In my experience, business success is often hinged on the reports and the insights they provide. Without accurate numbers, you've got nothing to go on.

tiborsaas · on July 12, 2019

I can't not remember now the Salesforce for developers article :)

eej71 · on July 12, 2019

I think it will be easier to solve those problems in the Excel development space than it will be to export the Excel solutions to another solution domain.

Also the kind of mistakes you've described can (and do) happen in other programming domains that are deemed more respectable.

PaulHoule · on July 12, 2019

Unfortunately no.

Excel is hamstrung by having to maintain backwards compatibility for an endless number of hacks.

For instance, it is notorious that

0.1 + 0.2 != 0.3

in binary exponent floating point math. Excel does funny stuff with number formatting that hides this, but it is like having a bubble under a plastic sheet that moves someplace else when you push on it -- numeric strangeness appears in different places.

The right answer is to go to decimal exponent floating point math, but that is only HW accelerated on IBM Mainframes, maybe on RISC-V at some point. You'll probably crash Excel if you have enough numbers in it for performance to matter, but Microsoft would be afraid of any performance regression and it would break people's sheets so it won't happen.

On a technical basis we could use an Excel replacement that has some characteristics of Excel, and other characteristics of programming languages; one old software package to look to for inspiration is

https://en.wikipedia.org/wiki/TK_Solver

What makes it almost impossible to do on a marketing basis is that Excel is bundled into Microsoft Office so if you have an Office subscription you have Word, Powerpoint, Excel, Access, etc.

dsjoerg · on July 12, 2019

GP was about how Excel lacks "access control", "auditing", "change tracking". Then OP says it'll be easier to add those things to Excel than to make non-Excel suit the purpose.

Then you say no, backwards compatibility problems. But how does Excel's need for backward compatibility make it hard to add "access control", "auditing", or "change tracking"?

PaulHoule · on July 12, 2019

"access control" by definition means "make things not work some of the time"

Excel has change tracking, but like Jupyter notebooks and similar products it doesn't make the clean distinction between code and data that is necessary for it to be useful. (e.g. if I develop an analysis pipeline and use it for May 2019 it should be as easy as falling off a log to run it for June 2019)

telegato · on July 12, 2019

To be fair in Javascript 0.1 + 0.2 === 0.3 also is false :). Not saying that Javascript is a great programming language.

PaulHoule · on July 12, 2019

It's a problem in all mainstream programming languages that Excel has a strange answer to.

The trouble with it is that it is a big distraction to the "non-professional programmer" who uses tools like Excel.

thfuran · on July 12, 2019

Yeah, that's the case for anything using IEEE 754 floating points.

phonon · on July 12, 2019

"At its heart, VisiCalc is about numbers. One of the early decisions we made was to use decimal arithmetic so that the errors would be the same one that an accountant would see using a decimal calculator. In retrospect this was a bad decision because people turn out to not care and it made calculations much slower than they would have been in binary."

http://rmf.vc/implementingvisicalc

smsm42 · on July 14, 2019

That sounds like you want to rebuild the whole paradigm of how 99% of software world treats numbers (which may be, at the end, a right thing to do but unlikely to happen at least in the near term) and you are laying the fault of it not happening at Excel's door. That would be very misplaced blame.

Vaslo · on July 12, 2019

Why would anyone working with amounts in the millions or billions care about 0.1 + 0.2 != 0.3? Splitting hairs here I think.

zelphirkalt · on July 12, 2019

Not so sure about that. We are possibly also talking about reports as in some other comments. Those might need to be exact. Very exact.

halfeatenpie · on July 12, 2019

See that's the thing.

You don't need Access Control because it's just an excel file and whoever has the file has the planning model. In most organizations like this, there's only like 5 or 8 people who all work together on a team, so it's not like 50-some people who are separated. Excel is a tool, a tool that you use to get things done. I mean even programming solutions have issues such as Knight Capital Group's $460 million loss due to mistakes in their code and deployment processes. Planning modules does not mean Operational modules. It's a big difference/jump between the two.

Again, these are all tools. What's important is how you use it and how do you properly validate these calculations. the thing you also need to understand is that no critical infrastructure planning is perfectly to-the-dot numbers. Our systems are far too complex, built off of human operational decisions, and have so many unknown losses that planning modules are based on scenarios. Also we're talking about thousands of these agencies around the world and their critical infrastructure planning teams are mostly 5 to 8 people. The current agency I'm working with has hired around 3000 employees to operate the systems but only 5 or so are really in-the-deep running these models and building these plans.

These people are not programmers. They spent their time learning about resource planning, mathematics, operation theory, physics, and engineering. Excel is an incredibly user-friendly tool that lets them automate a tremendous amount of their tasks. They're all very intelligent and can definitely learn how to code, but that takes their time away from more critical skills and tasks they need to accomplish.

The most important solution to the problems you've stated is the workflow. You setup a proper workflow, and risks of these problems should be minimized.

navigatesol · on July 12, 2019

>Then you get employee turn over where new employees don't get "arcane" knowledge passed down by people who left and took their spreadsheet foo with them.

Yeah, this never happens with a proprietary codebase!

mcphage · on July 12, 2019

Who needs historical knowledge when you can just rewrite the program every few years in a different environment with all new bugs? (and a smattering of the old ones, too)

Ntrails · on July 12, 2019

Few work things have bemused me more in the last couple of years than watching someone re-implement a slow SQL process in an HDFS big data platform forgetting all of the things that went wrong the first time, fixing them again, and then running into weeks worth of new bugs just to get not time critical data faster.

yellowstuff · on July 12, 2019

Fidelity didn't lose $1.3B, they miscalculated by $1.3B. The same with the other numbers you quote. Spreadsheets are error prone, but they still make sense for a lot of use cases.

https://www.cio.com/article/2438188/eight-of-the-worst-sprea...

acdanger · on July 12, 2019

For reference on the the above errors: https://www.cio.com/article/2438188/eight-of-the-worst-sprea...

tajd · on July 12, 2019

Or tests suites for checking formulae or how the said formulae are applied across the spreadsheet?

inputcoffee · on July 12, 2019

The Fidelity "minus sign mistake" didn't create a loss. The mistake was in relaying the information to the end user. It didn't actually cause a loss of that magnitude.

That is like saying if I mistyped in a word doc, that word created the loss.

HeavyStorm · on July 12, 2019

Ahn, so sorry, but that's a very flawed argument.

Arcane knowledge will be an issue with magical spreadsheets, as it is with any software (and I could argue that with custom software you have this legacy knowledge over both code AND ui, which is worse than just a spreadsheet).

The last paragraph is key, though - auditing and access control are tough ones for excel, and there are many tasks that require this functionality.

The trade-off is customizability - any excel can be changed to your specific need whereas that nice and shiny enterprise software you build may lack one or two things which you will never be able to change.

thirdsun · on July 14, 2019

> Excel does not have "access control", "auditing", "change tracking". Just getting work done is not enough.

Doesn’t the web version of Excel already solve that? I’m not familiar with it but Google Sheets does these things, at least to some extent - I’d expect Office 365 to do so as well.

schergr · on July 12, 2019

Uh... You have the same problem with Cobol written 20 years ago without comments.

Xelbair · on July 12, 2019

I just hate one single thing about Excel - all function names are localized - and in case of my native language - they are horrible and inconsistent.

franga2000 · on July 12, 2019

While I haven't encountered localized names, localized formats make Excel an absolute pain for me.

My language uses the "European" number format of a comma for decimals and periods for thousands separators. Excel tries to adjust to that by using semicolons for argument separators (i.e. ADD(1.5, 3.5) -> ADD(1,5; 3,5)).

The problem is that their locale detection is wildly inconsistent and there isn't a good way to override it without changing system settings. When moving a file between computers, this is an utter disaster.

Freak_NL · on July 12, 2019

Excel goes one step further, and assumes that users in some locales (e.g., Dutch) want semicolons as separators in every CSV file you open in Excel.

I don't care (I just use LibreOffice Calc, which accepts any delimited values file just fine, and just asks which separator to assume), but it means that when you develop an option for users to download some statistical data as comma-separated values file (which is easy to generate), that you now have a problem if that user wants to open it in Excel (which makes sense for CSV) and that user is using one of Excel's broken locales (e.g., Dutch) where comma-separated values are not understood (again, it wants semicolons in Dutch).

I.e., the expected file format changes because of the locale Excel uses! So where someone in the US can just double click the CSV file (not to mention anyone using LibreOffice/OpenOffice Calc anywhere in the world), someone using Excel in the Netherlands will just get a spreadsheet where every line consists of that whole line of values stuffed into column A.

Grumbledour · on July 12, 2019

And that's not even the worst. You can teach people the few buttons necessary to read in the text, but once they save it again, the file is changed! Now you have a different separator and chances are good, excel also mangled date columns, removed leading zeros from numbers and switched out decimal separators! So you may not be able to further process the file. And its also sad, because if they just made some of these changes optional, excel would be a good tool to quickly work on an csv file and pass it on. But as is, I am always hesitant to work with csv in excel, because it mostly breaks and becomes unusable for further processing.

FroshKiller · on July 12, 2019

I advise people to use the Import Data From Text/CSV feature rather than opening CSV files directly. It spares everyone a headache.

lhoff · on July 12, 2019

Oh I didn't know that is a localization thing. I use the german version and got used to open csv's in notepad first (search and replace ; with ,). I always thought that's some kind of the usual MS vs. the rest of the world thing.

But this a whole other level of stupidity.

davidpolberger · on July 12, 2019

You can specify what separator to use as the first line in the CSV file:

sep=,

My experience is that Excel 2007 and later will correctly parse the file and use the specified separator. However, other software, such as Google Sheets, will simply render the declaration as-is.

Then there's the issue of what character encoding to use to encode the file, whether to include a byte-order-mark with UTF-8 to make Excel recognize the file as UTF-8 and the effect that has on whether the separator line is recognized (spoiler: it isn't).

Here's part of the documentation I wrote for Calcapp's CSV exporter, which digs into these issues in more detail (Javadoc):

  /**
   * The prologue of files containing comma-separated values (CSV). This
   * prologue contains an instruction detailing the separator character that is
   * used in the file. This instruction is known to be understood by Microsoft
   * Excel 2007 and later versions, but is rendered as-is by other spreadsheets,
   * including Google Sheets.
   * <p>
   * Microsoft Excel expects either a comma or a semicolon to separate values in
   * CSV files, depending on the Windows locale. The only way to produce a CSV
   * file that can be read by Excel regardless of what locale Windows is set to
   * use is to use a prologue similar to this one, which explicitly tells Excel
   * which separator is used.
   */
  private static final String PROLOGUE = "sep=" + SEPARATOR_CHARACTER + "\n";

  /**
   * The character set used to encode files containing comma-separated values
   * (CSV): UTF-16LE (UTF-16 for little-endian systems). Using UTF-16LE allows
   * characters that cannot be represented by the ASCII character encoding to be
   * correctly read by Microsoft Excel and other spreadsheets.
   * <p>
   * There is no way to formally specify the character set used by a CSV file.
   * With one exception, Excel assumes that CSV files use the ASCII character
   * encoding, unless the first three bytes consist of a byte-order mark, in
   * which case the UTF-8 encoding is used. (A byte-order mark is redundant for
   * UTF-8, as it does not depend on endianness, but is traditionally used by
   * Microsoft Windows applications to detect whether a text file uses the UTF-8
   * encoding.)
   * <p>
   * Excel only recognizes a file as being encoded with UTF-8 if a byte-order
   * mark is included, but doing so prevents Excel from recognizing the
   * information of the {@linkplain #PROLOGUE prologue}, which in turn prevents
   * CSV files from being produced which work regardless of the locale Windows
   * is set to use. This is likely due to a bug, present in Excel 2007 and
   * likely later versions as well (based on anecdotal evidence).
   * <p>
   * Fortunately, Excel does recognize another character set which can encode
   * all of Unicode: UTF-16. Excel likely uses heuristics to determine that a
   * file is encoded using UTF-16. (Text written using Western languages and
   * encoded using UTF-16 tend to include many null bytes for various reasons,
   * making the detection of UTF-16 trivial, but only for Western languages.)
   * Unfortunately, UTF-16 is dependent on endianness, meaning that it would be
   * desirable to include a byte-order mark at the beginning of the file.
   * However, that does not work due to the aforementioned bug.
   * <p>
   * In other words, using UTF-16LE should work well for CSV files containing
   * mostly Western text and parsed on little-endian systems. It is probable,
   * though, that files produced using this converter will not work if the text
   * mostly contains Chinese, Japanese or Korean characters or if the file is
   * parsed on a big-endian system.
   */
  public static final Charset CHARACTER_SET;

Freak_NL · on July 12, 2019

That separator preamble seems Excel-specific though:

https://tools.ietf.org/html/rfc4180

If I'm going to mangle CSV into a non-standard form to conform to whatever Excel expects, I might as well go the extra mile and just provide an XLSX file.

That was my solution to a data-export function in an API (it switches on the HTTP Accept header). It now offers a choice of CSV (RFC 4180), ODS, and XLSX, and people can just pick XLSX if their Excel is using one of its broken locales.

It's not too hard to generate the XML for ODS (OpenOffice/LibreOffice etc.) and XLSX (Excel) from a bunch of tabular records once you've set up minimal empty template ODS/XLSX files to inject it in. It's certainly less work than explaining to Excel users that their software is broken and how to work around it.

That said, ODS was much easier to get working.

qwerty456127 · on July 12, 2019

+1000. This is ridiculous. Couldn't they at least make this optional so you'd be able to switch this off somewhere in the configuration dialog, registry or command line? I hate localized apps altogether (who even needs localized Visual Studio srsly? I can't believe anybody can be competent in a .Net programming language and relevant frameworks and practices without being able to read English) but localizning some apps makes some sense for some people, nevertheless localizing functions feels beyond reason. I feel thankful they didn't localize C# and command line commands :-)

marcosdumay · on July 12, 2019

> who even needs localized Visual Studio srsly?

What's not to like on Oracle's localized error messages that you can't google because all the documentation is in English?

Even more because some on my language omit useless words like the equivalents of "not" or "can't".

qwerty456127 · on July 12, 2019

Sounds like this is being done on purpose to sell more support services and to hinder international migration of specialists.

ekingr · on July 12, 2019

The worst of it is that there are some very specific (and undocumented) edge cases where the translation won't be done properly. It create very 'fun' to debug issues where a spreadsheet would work in a French Excel and not in an English one.

(For instance if you use the "Row-Column" cell reference, in English it is `R1C1` while in French `L1C1` - and it won't translate)

kgwgk · on July 12, 2019

Even worse, the token used to separate arguments is localized. So you may have to use semicolons instead of commas. But in some places you need to pass a string representation of a formula and there Excel will only understand the non-localized commas. Fun all around!

vidarh · on July 12, 2019

I once had a contracting gig where I was tasked with updating a bunch of Word BASIC macros (this was Office 95 days...) used to maintain the ISO 9001 documentation for a major company.

Problem was the original version was written in their Danish office, and whoever wrote it had handed it to the Norwegian office exported as text before he left, and they'd imported it and started munging it and adding lots of stuff before someone bothered to try running it and realized it was completely broken. I wish I'd dug more into how they'd managed to get themselves into a situation where they'd significantly modified the code before trying to run it even once...

Version control? What is version control?

So they had this broken version that had some Danish keywords with a bunch of Norwegian updates, that they didn't want to just re-import into a Danish version and copy over properly in a tokenized form and having to try to identify and re-apply the Norwegian updates. So I got the fun job of properly translating it and figuring out what the new code was meant to do and make it do it in the process, with no access to any of the people who had worked on it.

[it is worth pointing out that this was an office for several hundred staff of a major international company that developed large scale information systems for things like police departments; while version control wasn't everywhere in the mid 90's, a large systems integrator certainly ought to be using it... Particularly amusing that it was their ISO 9001 documentation that was being handled in such a haphazard manner; happy to have never been a customer of theirs, though]

The task was extra "fun" for certain values of fun, because unlike, say, English vs Norwegian, or English vs Danish, Danish and Norwegian are close enough that 80% of the time a term from the Danish version might look right and be valid Norwegian, but the odds seemed to be (and maybe my memory is exaggerating it due to my lasting memory of extended pain) about 50/50 that the Norwegian translator of Word BASIC had chosen a different function name to the Danish translator for no good reason. Or there were slight, hard to spot, spelling differences. And sometimes what looked like a mistake was a function defined locally.

I spent a couple of weeks reading through the whole thing and mostly rewriting code that could have been written from scratch in a couple of days if they'd just told me what the end result was meant to be instead of dumping a pile of non-functioning code on me. But I guess I was cheap compared to their regular staff back then.

It was how I learned Word BASIC (I'd taken the contract, figuring it'd be easy to pick up, so I told the recruiter that sure I knew it, on the basis that I certainly knew a couple of variants of BASIC). Never to use it again (at it was replaced by a Visual Basic variant a few years later anyway)

coldtea · on July 12, 2019

Don't use the localized version of Excel in general. Problem solved.

ezequiel-garzon · on July 12, 2019

In a corporate environment you normally don’t get to choose that.

coldtea · on July 12, 2019

You should.

baq · on July 12, 2019

tell that to enterprise IT.

psychoslave · on July 12, 2019

Make a new locale :)

mirimir · on July 12, 2019

On the same hardware, with the same table, Excel in Windows 7 does a cross-tab faster than MySQL in Ubuntu does. Indeed, MySQL chokes if there are more than a few hundred columns, but Excel just keeps going. And it uses all CPU cores.

Edit: In case anyone is wondering why I did that, I wanted a simple visualization of ping-location results for thousands of IPv4 from several hundred ping-probe locations. So that meant aggregating (getting minimum rtt for) millions of observations, and displaying min(rtt) in a IPv4 by probe location cross-tab. MySQL did a great job at the aggregation, but choked (as in, errored out) with several hundred numeric columns. Even if I converted them all to SMALLINT.

kyllo · on July 12, 2019

The other thing people don't realize about Excel is that it has a compressed, in-memory, columnstore analytics database built into it. It was called Power Pivot then Power Query, not sure if it changed names again, but it's actually a SQL Server Analysis Services Tabular Model, the same thing that's in Power BI as well. It's going to perform much better than a row-oriented RDBMS at aggregating columns because that's all it was designed to do.

d__k · on July 12, 2019

It was a separate data processing engine within Excel developed by another team (SQL server) for the purposes of self service BI (Power Pivot/Query). Most probably, this engine is not used for normal (traditional) calculations.

kyllo · on July 12, 2019

You mean if you create the pivot table from data in a sheet? It's not needed for that because you'll run out of memory trying to fit any more than about a couple million rows in a sheet first. Meanwhile you can easily load 10s and possibly 100s of millions of rows into Power Query.

MooMooMilkParty · on July 12, 2019

You're blowing my mind here a little bit. I don't use MySQL but do use Excel and the R/Python data science stacks quite a bit. I regularly open data with tens or even hundreds of thousands of columns in the latter by using lazy computations. Is this not the case with database approaches? What kind of hardware are we talking? Presumably Excel has to load everything into memory at once since it's immediately viewable, right?

mirimir · on July 12, 2019

In MySQL, maximum row size for non-text columns is 65,535 bytes. MySQL is limited to 4096 columns per table, and 1017 columns per table if the InnoDB engine is being used.

Excel's limits are 16384 columns and 1048576 rows.

I'm talking wimpy hardware here, I admit. Basically, VirtualBox VMs on a quad-core i5 box with SSD and 8GB RAM. With the VM having three cores and 6GB RAM. But it was the same wimpy hardware for Windows 7 and Ubuntu.

hef19898 · on July 12, 2019

Oh, the day Excel got rid of the 64k line limit! Was a very sweet day!

michaelper22 · on July 12, 2019

Notwithstanding the row length and column count limits, did you also give MySQL more memory? Setting InnoDB buffer pool size to 3 or 4 GB might help; its default is about 134 MB.

mirimir · on July 12, 2019

I think so, but don't remember for sure.

sheetjs · on July 12, 2019

> Presumably Excel has to load everything into memory at once since it's immediately viewable, right?

Depends on the file format. An XLS or XLSB file can contain special markers for where each logical row starts, so it can randomly access rows; Both also can persist "calculation chains, which are a simplified dependency graph. The binary formats also store formulae in a parsed representation allowing easy scans to see what cells have to be inspected if a file needs to be recalculated.

MooMooMilkParty · on July 12, 2019

Aha! As far as persisting calculations, this makes sense. I guess in my head I was thinking "pure" data where everything opened was precomputed. Forgive my ignorance about database solutions, but do they not implement something similar?

But now you've got me thinking, it would be nice if libraries like Dask could allow for flagging of symbolic operations like this to be written to disk for quickly saving metadata where intermediate steps don't explicitly need to be saved.

zaarn · on July 12, 2019

Might be worth a try to look into Postgres. In case of the column problem, PG has a limit of 8 Kilobytes for Column Width (instead of a count of columns). You can circumvent some of the limits by using arrays, types, json or hstore. It also has pl/Python which lets you code up queries containing python fragments for data processing.

keyle · on July 12, 2019

Meanwhile <side rant> open any document in Photoshop with a few layers and effects and the computer grinds to a halt, no matter the specs, no matter the year, through the ages. New hardware comes out? Booya, new Photoshop XYZ -> let's put your fancy hardware on its knees, begging for air.

coldtea · on July 12, 2019

>Meanwhile <side rant> open any document in Photoshop with a few layers and effects and the computer grinds to a halt, no matter the specs, no matter the year, through the ages.

Hasn't been my experience ever. Photoshop is one of the speediest image manipulation programs out there...

piyush_soni · on July 12, 2019

I'd agree, at least when compared to GIMP. It's also like 100 times more intuitive to use Photoshop than GIMP for a beginner.

Mistletoe · on July 12, 2019

while paying a monthly fee...

pidg · on July 12, 2019

The monthly fee for Photoshop is around 1/100th of the old retail price. That works out at about 8 years of use before you have to 'buy' the software again. Seems fair to me.

And for those who don't like paying for Photoshop - which, given it's an astonishingly powerful piece of software, probably means "people who don't actually need Photoshop" - there's always cheap or free alternatives that provide about half the functionality.

coldtea · on July 12, 2019

>The monthly fee for Photoshop is around 1/100th of the old retail price. That works out at about 8 years of use before you have to 'buy' the software again. Seems fair to me.

That's because you only think of yourself. Creatives are one of the most struggling professions, and can have widely different returns per year, and salaries in different parts of the world can be much lower.

People who could afford to buy Photoshop at some point (perhaps even a used copy), later down the road might not be able to pay the subscription for a few months, but in the new scheme they lose access to the program altogether.

8 years? There are creatives that use 10 and 20 year old versions of Photoshop on Windows.

notyourday · on July 12, 2019

You pay per year. $120 gets you access to the photo bundle for one year. That's 30x cups of Starbucks drip coffee a year. Creative people who are 'struggling' blow more than that a month on weed, beer, coffee or Tinder. We all make choices that show what we prioritize.

Anyway, the subscription model for this kind of software works really well. Software keeps becoming better and so far Photoshop has been running circles around the limpy thing known as Gimp

P.S. I just got hit with the next year charge. I was annoyed so I tried installing CS 6. It's OK. It still beats Gimp but I would rather not drink coffee or smoke weed for a month to have money to pay for the yearly subscription for the photo bundle.

P.P.S. And if you are really really really struggling "creative" you are probably young and either go to school or have a friend that goes to school which can get you it under the EDU discount, which is peanuts.

mirimir · on July 13, 2019

I see many versions of Adobe Photoshop CC on The Pirate Bay, versions 2017-2019. At least one on them must work.

Grumbledour · on July 12, 2019

> And for those who don't like paying for Photoshop - which, given it's an astonishingly powerful piece of software, probably means "people who don't actually need Photoshop" - there's always cheap or free alternatives that provide about half the functionality.

Now? Absolutely! 20 years ago this was much harder. Adobe used to dominate the market so that you used Photoshop even if you often didn't really need all its complexity. But these days, there is so much competition that you can probably find something that works well enough. I would assume that hurts their bottom line, but as a consumer, I am happy.

TeMPOraL · on July 12, 2019

> Adobe used to dominate the market so that you used Photoshop even if you often didn't really need all its complexity.

Still does. Part of the reason is that in the past, they turned a blind eye on people pirating Photoshop. Every kid with even passing interest in graphics or photography got themselves a bootleg copy of Adobe tools to work on; it was an easy guess what they'll be using later, as adult professionals.

craftinator · on July 12, 2019

Haven't noticed this problem with GIMP on my Thinkpad X100e.

michaelcampbell · on July 12, 2019

I wonder how something like python/pandas would do on that.

hef19898 · on July 12, 2019

So true, I used to run European cross-docking ops, planning, inventory, forecasting, scheduling, in Excel back the day. Successfully scaled by 100% YoY for three years. Was quite a lot of fun.

Issue I have with that approach is not using excel per se. It is using Eycel in addition to what ever system is being used the first place (SAP for example is a pretty popular thing to circumvent with Excel sheets). That and over engineering spreadsheets to the point where nobody but the creator can use them anymore. At that point you data just gets suspect. And circumventing existing systems with local offline spreadsheets just screws up everything. Now combine these two.

But that has less to do with spreadsheets themselves, see my fist point regarding cross docking, and more with the application. Used correctly spreadsheets can be incredibly powerful. And at least for my purposes I have yet to encounter an analysis issue I failed to dig through using Excel. Sure, something like Python might have worked better but consider me to be an empowered user (read: I have no clue about Python or SQL or...). And at least I knew I could trust the way the analysis was done as intended.

TeMPOraL · on July 12, 2019

> It is using Eycel in addition to what ever system is being used the first place (SAP for example is a pretty popular thing to circumvent with Excel sheets). (...) And circumventing existing systems with local offline spreadsheets just screws up everything.

I usually side with the circumventers on this one. This misbehavior happens for a reason, which usually is that it's impossible or infeasible to do the work with "proper" systems. Excel sheets on pendrives is what you get when people can't exchange data using "correct" software, or when people need to continuously iterate on the shape of the data set, while getting the schema changed in the "correct" system would involve a ticket to IT and 2 months of waiting.

I think instead of paying people for doing a job and then doing everything possible to make that job difficult, companies should embrace that people will work around any deficiencies of top-down systems, whether software or procedural. But I guess this tug-of-war exists since the time of first corporations.

hef19898 · on July 12, 2019

True that. I used to both back then, first a user and then a SAP system guy. For me it is always a death spiral: people don't trust the system or it is too cumbersome, they use Excel, the system gets even less reliable, the use Excel even more,... and so on. I always saw it as my role a system guy to give users a system they can use and then force them to actually use it. Because, when providing the systemnI talked to them first so in the end it was ind of their system too.

And then there are examples like the production planning done in Excel because SAP PP isn't just good enough. Then the guy who created said Excel tool left. Years later when production had to switch to weekend shifts zhey couldn't because no one could adopt the Excel tool. Once you reach that point you are screwed.

chaseha · on July 12, 2019

Agree but slight modification - goal should always be for the users to be able to do their main functional tasks in-application and maintain it as the system of record.

But there will always be a use for Excel as a tool for informal data-monkeying and analysis - exporting a table and doing some pivots to summarize or use a different view. Non-technical users will always expect this capability and flock back to it. Not everyone's a programmer or has the interest to be - you'll have to pry excel out of most accountant's cold, dead hands for example

hef19898 · on July 12, 2019

You describe the ideal world. Tools like SAP or Business Warehouse suck at data crunching and nor everyone can use Python for example. Even ops can under the right circumstances be okish offline in Excel as long as results are imported into e.g. SAP to assure data integrity.

rjtavares · on July 12, 2019

I feel like PowerBI is a good solution to overcome Excel's deficiencies while working well with Excel. It forces you to structure your data and define measures while being dyanamic, efficient and friendly to less structured inputs.

hef19898 · on July 12, 2019

I have to give PowerBI a serious shot. We have it at my current place but nibody is really using so I need some spare time to dive into it. Now, the more I think about it, the combination of MS Office, Power BI and MS Dynamics with a shopfront from, say Shopify would make up a quite nice e-commerce set up.

ekianjo · on July 12, 2019

> I have deep respect for Excel.

I do something around the same line as you describe (build better tools in R/Python for larger data problems) but I have a deep aversion to Excel. It's proprietary, has horrible standards (it does not support native UTF8 csv files for example), and is stuck in the 80's in terms of paradigm. And this is precisely the tool that prevents people from doing things more efficiently because "they can do it manually in Excel even if it takes a long time". It makes people take really, really bad habits.

halfeatenpie · on July 12, 2019

Excel is a tool. It empowers those who don't know how to code to do their jobs more effectively than before without "learning to code". It's something you keep in your garage in case you need it. It depends on what their current workflow is. Sometimes I'll use Excel, sometimes I'll use R/Python.

What's important is that Excel gives you access to the computing power of modern technology with a lower barrier-of-entry/knowledge. That's one of the most valuable things that many people fail to understand. Not everyone is a programmer and for a lack of a better term, "doesn't really care about UTF-8 and just want the calculations to work". Also, in many cases Enterprise IT in critical infrastructures are resistant to supporting R or Python due to their security models (firewall rules prevents install.packages('tidyverse') for example).

This is what makes us specialists and someone they bring on board to help them out and explain to them about what the "best-use-case" is, and then watch as people try to cram a square peg into a round hole.

astine · on July 12, 2019

Well if you're using Python, you're following an imperative/oop paradigm which goes back to the sixties, so I don't know why you'd criticize Excel for using a dataflow model. I can understand not wanting to use Excel because it's proprietary (but Libreoffice isn't) but I can can understand not wanting to deal with a lot of it's legacy cruft, but for the stuff that Excel does well, it does it a lot more quickly, more interactively, and more intuitively for most people than a similar solution in Python. In fact, one of the key benefits of an Excel solution is that you can see every step of your calculation which creates a degree of self documentation which you don't get with Python, for example.

ekianjo · on July 12, 2019

> that you can see every step of your calculation which creates a degree of self documentation which you don't get with Python, for example.

If you want to do that in Python then Jupyter Lab/notebook is a good solution to document your code and see the result of each operation. And way faster than Excel for large data tables.

BeetleB · on July 12, 2019

Yep. My first job had a lot of Excel. I hated it. Once I learned pandas, I rarely needed Excel again for my work.

taneq · on July 12, 2019

Think of all the open, modern, nicely standards compliant tools that you use. Now think of the fact that Excel lets people solve their problems more easily than all the things you like.

I'm not saying it's good, but it's better than everything else (for the people who use it.)

phonon · on July 12, 2019

> it does not support native UTF8 csv files

It does though? You can export as a UTF-8 csv file...check the export options.

ekianjo · on July 12, 2019

I am talking about importing CSV files in UTF8. If I use Japanese characters for example it fails utterly to display properly unless I import it as "Text" first.

jchb · on July 12, 2019

The secret trick for importing data into Excel (without actually writing a file in the Excel file format): don't try CSV - as you have already noticed that will fail in all kind of fragile ways depending on your character encoding and operating system locale. Instead, convert your data into an html table and then paste or import that into Excel.

But not any html table, you need to have the appropriate proprietary css properties in there, such as the mso-data-placement:same-cell. Those are actually documented ...in the Office HTML and XML Reference published in 1999.

chaseha · on July 12, 2019

I frequently go from SQL query results to HTML table to excel in a similar manner, but I haven't had issues w/ any proprietary CSS properties. Using Paste Special -> Match Destination Formatting it always comes out without any issues

phonon · on July 12, 2019

No, you should use the Data import tools--that's what they're there for!

phonon · on July 12, 2019

Yes, it takes a little longer, but you should use Data > From Text/CSV for more complex text/delimited imports.

formercoder · on July 12, 2019

In what way is the paradigm stuck in the 80s?

keyle · on July 12, 2019

But how do you handle sharing? The main issue with Excel, and its main downfall, is the sharing of data and versioning, with person X has an older version of person Y by a few hours and needs an email of person X to keep doing her job.

fzumstein · on July 12, 2019

With a few tricks, you can actually use a professional version control system like Git with Excel, see the slides of my recent webinar: https://www.slideshare.net/xlwings/git-for-excel-files-webin...

discordance · on July 12, 2019

If it's O365 Excel, then you can share and work on the same spreadsheet in a similar way to Google Sheets (seeing other peoples real time cursor in your spreadsheet etc).

ryanjshaw · on July 12, 2019

Be very, very careful with this. There are some update/race condition bugs in Excel Online that result in multiple users causing bulk updates to incorrectly update the filtered/sorted view of other users.

unixhero · on July 12, 2019

This race condition happens often. In fact about every time I try to rely on it with my team. Such a shame Google Drive isn't seem as a "trusted platform" in my client environment.

DonHopkins · on July 13, 2019

Douglass Engelbart rolls over in his grave.

tsimionescu · on July 12, 2019

Sharing Excel sheets is extremely easy - you just mail them, or host them in some online place, like SharePoint if you've bought into the whole MS ecosystem like many companies do. Not sure what your point is there.

Versioning is of course more difficult, though Excel does support diffing in principle. I expect though that what most people end up doing is simply keeping track of versions manually, same as they would have in the days before excel.

dmitriid · on July 12, 2019

> Sharing Excel sheets is extremely easy - you just mail them

Unless the other computer is set to a different locale and your data contains formatted numbers and dates. Then everything breaks.

kyberias · on July 12, 2019

Are you sure? I'm pretty sure Excel doesn't store numbers or dates in localized format.

dmitriid · on July 12, 2019

They may have finally fixed it in the later versions. I remember it being a pain in an office that had multiple locales on various PCs.

fartcannon · on July 12, 2019

Diffing in principle is not diffing in reality. It's terrible at best. I 'm not sure what your point is.

signal11 · on July 12, 2019

OneDrive now supports versioning of documents, and the latest Office 365 uses this to offer “autosave” and version history.

I’ve not really dug into the feature but it’s very visible on the latest builds.

halfeatenpie · on July 12, 2019

Honestly, people handle sharing via Google Drive/Dropbox/Enterprise solution plus file naming convention.

The current enterprise network I'm on? They block Box, Dropbox, and a lot of other solutions. They instead use Citrix's ShareFile and a NAS setup with file naming.

Also remember, surprisingly most of these critical infrastructure planning isn't handled by hundreds of people, at most it's a team of 5 people who then present their findings and suggestions to the C-suite decision makers. It's not as big of a problem as many people think.

wastedhours · on July 12, 2019

We tend to keep ours on Sharepoint - which means we could use Excel Online to work collaboratively, but usually we don't need them all at the same time, so open in Excel desktop and "check it out". If someone else needs it, they can just ping on Skype and see if you're done.

jhoechtl · on July 12, 2019

Sharepoint or Office 365

keyle · on July 12, 2019

Thanks I actually didn't know that.

0815test · on July 12, 2019

> It's amazing

It is amazing, just not in a good way. Maybe I should try to find and post a link to that paper where it was pointed out that Excel was munging things like gene names because they get interpreted as dates in the 'untyped' cell input, and these things were showing up in published research.

Excel is a disaster, it will seem to work until it turns out it's doing something weird behind your back, or maybe you've made a careless mistake somewhere and it has zero tools to help you catch it. And you won't know until it's far too late. Please say no, use anything else - Python, JavaScript or heck even QBASIC or whatever - but just don't use Excel.

beatgammit · on July 13, 2019

A friend of mine works converting spreadsheets to Python/Java programs. Basically people at the company solve the problem with Excel, and when that gets to slow, he codes it up to read from a CSV and they're off to the races.

It's amazing how far these business types can get before needing a programmer to optimize things. As a programmer, I tend to think in terms of logic, and then I tried to build a complicated budget tool in Python and the turnaround was just atrocious. So, I did it in a spreadsheet instead and was far more productive.

Spreadsheets are awesome, and whoever came up with the concept should get some kind of award. I can't think of a single software tool that has enabled more productivity than a spreadsheet.

c0vfefe · on July 16, 2019

> I can't think of a single software tool that has enabled more productivity than a spreadsheet.

I submit to you: a clock, and a calculator. :)

_clhx · on July 12, 2019

Excel is good, but the problem is data storage. I always wanted to try and solve that problem, but it is very hard.

I do similar work but a different industry. My solution has been also to have Excel downloads, which work really well, but there are drawbacks.

safgasCVS · on July 12, 2019

"just like riding a bike" - best summary of Excel I've read!

jjakque · on July 12, 2019

And is it just me, or hopping on OpenOffice Calc or Google Sheet felt like a different device altogether: They all look like bikes, but accelerate differently, change gears differently etc?

safgasCVS · on July 12, 2019

same here. like coming home but someone has moved things around in a way thats subtle enough its not obvious but enough to make it feel unfamiliar

nstart · on July 12, 2019

I love working with spreadsheets! In a past life, I was all about taking data from various sources, processing it, and spitting out CSV's that I would then work on with a data analytics person using Excel. Curious if y'all use a similar approach of generating CSV's or do you go even deeper and actually create excel sheets with excel native stuff in it (like formulas).

scottmcdot · on July 12, 2019

Can I ask, what is "critical infrastructure planning"? Googled it, but wasn't sure as it appears to transcend many contexts.

halfeatenpie · on July 12, 2019

What I specifically handle is Water. Starting from the natural system (weather generator + Hydrology) through to the human system (Demand model (showing increase over time and temperature increase) + system operations (dams, reservoirs, treatment facilities, pipes, etc.) to delivery to customers.

The big problem we answer is "How can we continue delivering water to people while under climate change uncertainty? What policies or new infrastructure can we build so we remain robust and resilient under these new conditions?"

willbw · on July 12, 2019

I would guess something like transport (train network) or maybe utilities (power grid)?

viraptor · on July 12, 2019

My issue with spreadsheets is that they could be improved a lot with minimal changes, not nobody seems to do it in popular office packages. Making them more database-like and making table data first-class (at least you can make named tables in excel on windows) could be used to push people a bit more towards organised data, without changing how anything works.

Half of the mess that makes excel hell comes from the fact it's too easy to put two tables of data + some random constants on a single sheet and refer to them by H3. Now it's hard to add more data, hard to move anything, and hard to create space which expands to the next row with existing formulas.

Airtable (and Access) implements this idea, but unfortunately sacrifices the generic, free-form spreadsheet along the way.

I'd be even happy with clippy popping up with "It looks like you're adding a new table in the same sheet. Would you like to learn about using multiple sheets?"

epistasis · on July 12, 2019

I'd recommend watching this YouTube video; it's only one person's take on the right way to use Excel, but Excel has many ways to handle these issues:

https://m.youtube.com/watch?v=0nbkaYsR94c

LeonB · on July 12, 2019

The link is to "You Suck at Excel" by Joel Spolsky, which is an excellent tutorial.

I sincerely wish it had a different name, though!

I would love to send this video to some of my business colleagues inside a large enterprise. They need this information and they would enjou everything about this video. However, it would not be acceptable to send them a video entitled "You Suck at Excel."

If it had a more enterprise-friendly name I would even have a link and description to it in my email footer inside the enterprise. It would really help a lot of people.

TuringTest · on July 12, 2019

You can embed it in a more professional page, and hope that no one will notice the title at the overlay ;-)

LeonB · on July 12, 2019

Genius idea. I have a bunch of domains, combine it with the right subdomain... and we're onto something. Adding this to my list of things to do. Cheers!

Update: okay check out https://excel.secretgeek.net/

I've re-badged it as "Secrets of Mastering Excel" and used absolute positioning to put a label to that effect over the video's title.

Ideally I'd detect when the video starts and remove the label. Hmm. Not sure of the right approach.

I recommend (at the foot of the page) that the viewer watch it in full screen (which will remove my dodgy title)

J_cst · on July 12, 2019

That's a classic, definitely to be watched!

viraptor · on July 12, 2019

Yup, love it, keep recommending it to people.

tonygrue · on July 12, 2019

I'd be over the moon to see one change to Excel. Native support for a language other than VBA. Perhaps the CLR. Perhaps Java/Typescript. Just something (optional) for programmers who want to use Excel and not want to deal with the garbage that VBA is.

Some workarounds exist, but afaik they require collaborators to also have the tool install, which is dead in the water.

I know MS has considered it, I'm still pretty surprised they've haven't followed through.

dolmen · on July 12, 2019

MS hasn't just considered it. They have done it. This is called Visual Studio Tools for Office. https://en.wikipedia.org/wiki/Visual_Studio_Tools_for_Office...

tonygrue · on July 12, 2019

I've built plugins with VSTO (for Outlook) when I was at MS, but my memory is that it’s dependent on a plugin being distributed and not available for native use embedded into the excel workbooks.

You have to install Visual Studio, compile a plugin, register the plugin, and then from VBA you can call out to that COM interface if you'd like. Even if Excel users could overcome those hurdles, you still break the collaboration flow (i.e. just sharing an excel file).

Looking a little deeper, it looks like they're starting to support Javascript for the newer cross-platform add-in system (and also VSTO), but it looks like you still have to distribute your JS add-in via a web-service as opposed to being fully integrated into Excel and XLSM files.

The bit of just being able to share the XLSM file and users using Excel with no other installs or special network access, is really the make or break for me and non-professional programmer user scenarios I've seen.

kgwgk · on July 12, 2019

Microsoft would also like to see that change, that’s why they tried to kill VBA. The reason for not following through (in killing VBA, because they definitely support and prefer other extension mechanisms) was pushback from users.

grep_name · on July 12, 2019

Wait, really? I've been working in a thoroughly MS ecosystem for the first time this year, and I can't seem to find any evidence that they've tried to kill VBA based on how often I run into situations where VBA (or worse, VBscript) is the ONLY option, especially when working with excel's developer functions or writing expressions in SSRS utilities. Did the users push back because the options they were presented with were use VBA or have no scripting capabilities?

kgwgk · on July 12, 2019

To be fair, they denied that VBA was at risk: https://blogs.msdn.microsoft.com/architectsrule/2008/01/23/v...

But the fact is that it was missing from macos Excel for a while and they wanted people to migrate to VSTO: https://searchwindevelopment.techtarget.com/tip/On-migrating...

ChrisCinelli · on July 12, 2019

Google Sheet can run function and macro written in Javascript. That is what I use.

tluyben2 · on July 12, 2019

Although I don't like JS, I use that too, but it's pretty half-baked tooling compared to what Excel has. I would rather like to see just being able to point out some Google Cloud Functions and use those as spreadsheet commands. That would be powerful. Now you can but you have to do plumbing and you cannot reuse libs over sheets unless you make them public (last I checked, I could be wrong on this one but I read posts from as far back as 2011 where 'they are working on it'); you have to copy/paste. Yuck.

I wish someone (MS, ...) would do something, but I work on Linux so no Excel for me. And I do like very large datasets to be in the cloud anyway and not killing my laptop.

There should be more competitors in the space. I know there are a few, but they are not really competitors, just more niche / boutique products that attack a specific case, so you need to go back to Sheets or Excel anyway.

magicnubs · on July 12, 2019

> I know MS has considered it, I'm still pretty surprised they've haven't followed through.

Didn't they add Javascript support in 2018?

I agree though. They brought up that they were considering Python3 integration like 2 years ago and haven't said a word about it since.

DonHopkins · on July 13, 2019

You've been able to control all the Microsoft Office and other apps like Internet Explorer through "out of process" OLE automation for ages, and Python has a great win32com module (part of pywin32) for controlling "in process" and "out of process" OLE controls / ActiveX components.

https://pypi.org/project/pywin32/

http://timgolden.me.uk/pywin32-docs/html/com/win32com/HTML/d...

Intgrating COM into Python is one approach, but another approach is integrating Python into Active Scripting. (The age old extending/embedding debate.)

https://docs.python.org/3/extending/index.html

And Active Scripting (1996) let you plug different "in process" interpreters into the web browser and other multi-lingually scriptable applications, and call back and forth between (many but not all) ActiveX components and OLE automation interfaces more directly, without using slow "out of process" remote procedure calls. (Some components still require running in separate process, like Word, Excel, etc, which work, but are just slower to call).

https://en.wikipedia.org/wiki/Active_Scripting

https://ipfs.io/ipfs/QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1m...

>The Microsoft Windows Script Host (WSH) (formerly named Windows Scripting Host) is an automation technology for Microsoft Windows operating systems that provides scripting abilities comparable to batch files, but with a wider range of supported features.

>It is language-independent in that it can make use of different Active Scripting language engines. By default, it interprets and runs plain-text JScript (.JS and .JSE files) and VBScript (.VBS and .VBE files).

>Users can install different scripting engines to enable them to script in other languages, for instance PerlScript. The language independent filename extension WSF can also be used. The advantage of the Windows Script File (.WSF) is that it allows the user to use a combination of scripting languages within a single file.

>WSH engines include various implementations for the Rexx, BASIC, Perl, Ruby, Tcl, PHP, JavaScript, Delphi, Python, XSLT, and other languages.

>Windows Script Host is distributed and installed by default on Windows 98 and later versions of Windows. It is also installed if Internet Explorer 5 (or a later version) is installed. Beginning with Windows 2000, the Windows Script Host became available for use with user login scripts.

phonon · on July 12, 2019

JS functions just became GA.

https://developer.microsoft.com/en-us/office/blogs/office-ex...

o_nate · on July 12, 2019

I dislike VBA too, but it would be easier to stomach if Excel at least had a decent IDE. Compare the VBA editor in Excel with Visual Studio and the difference is stark.

ChrisCinelli · on July 12, 2019

I agree. Both https://airtable.com/ and https://www.smartsheet.com/ in my option add some value on top. Unfortunately they are not omnipresent as Google Sheets and Excel.

> Half of the mess that makes excel hell comes from the fact it's too easy to put two tables of data + some random constants on a single sheet and refer to them by H3.

That is a feature. If you can keep in a screen all the data you need, you reduce the cognitive overhead of having to switch back and forth between tabs. Of course if you start needing to add more data you need to "refactor". But it does not seem a lot more different that starting with a prototype and having to change everything to support the features of a real product.

TuringTest · on July 12, 2019

> it does not seem a lot more different that starting with a prototype

People underestimate this point. Spreadsheets are one of the best possible approaches for prototyping data-driven applications, thanks to its dual data/code nature and the ease to build and extend data types (just add more columns to a table); it's like building the application inside a debugger.

Non-developers don't have anything else that resembles a debugger, anywhere in the common approaches of software for end-users in industry.

Combine this with spreadsheet applications working as integrated environments -a single tool that handles all the computing needs of a project, without having to build a toolchain-, and it's no wonder than it's the preferred method for non-programmers to build custom automation workflows when their needs aren't supported by any specific software.

matwood · on July 12, 2019

> Spreadsheets are one of the best possible approaches for prototyping data-driven applications,

Bingo. Back when I worked in big corp Excel was the prototyping tool for the masses. If someone in a remote office built something in Excel that solved a problem that was useful to other offices my team would come in and use that work to build a real system. It was great because a lot of the requirements discovery work happened organically before we would even hear about the process/Excel.

Admittedly, we were a small team and could only take on so much work. This led to Excel being used beyond its capabilities which leads to a host of other problems.

TuringTest · on July 12, 2019

> It was great because a lot of the requirements discovery work happened organically before we would even hear about the process/Excel.

How I wish there existed software designed specifically for that organic discovery, instead of having one which was grown from a tool for accountants.

Spreadsheets is the best end-user development model we know, but it is limited by not having a concept of instantiating new objects.

Firadeoclus · on July 12, 2019

> That is a feature. If you can keep in a screen all the data you need, you reduce the cognitive overhead of having to switch back and forth between tabs.

That's true, but it doesn't necessitate different tables living in the same row/column grid.

kgwgk · on July 12, 2019

You can also see two different tabs at the same time side by side, no need to switch back and forth.

jhbadger · on July 12, 2019

My main issue with them is that they often aren't reproducible as the data transformations aren't logged. As a computational biologist I often deal with spreadsheets that experimentalists give me. I have no idea how the data were transformed from the raw data, and often the experimentalists themselves can't remember either. That's why I infinitely prefer real programming languages for use in data manipulation -- you can look at the code.

froindt · on July 12, 2019

PowerQuery is integrated into excel, gives a visual editing platform (not purely code), and is quite powerful in terms of linking up to any datasource, manipulating data, and giving you the output you need. All the steps are listed on the right side, and you can click back to any previous step in the transformation.

It's not super popular, but it does serve as a nice middle ground for people in the space who aren't strong in coding.

Grustaf · on July 12, 2019

> Half of the mess that makes excel hell comes from the fact it's too easy to put two tables of data + some random constants on a single sheet and refer to them by H3

This is certainly true, but that is also where almost all the power comes from, its generic nature. I had a startup that tried to reinvent Excel for 6 months and we kept moving closer and closer to the excel "sack of undifferentiated data" paradigm...

TBurette · on July 12, 2019

That's what Excel Tables are for. You get a table within your spreadsheet and instead of using cell references like =Sum(C2:C7), you can use structured reference like =SUM(DeptSales[Sales Amount])

beefield · on July 12, 2019

I have quite actively tried to avoid excel lately (which means also avoiding to learn new tricks...), and I am not that familiar with tables. Do the tables:

1. Enforce same data type per column. I.e., if you have a number in C4, you can't enter text into C5?

2. Enforce that if there is a formula, it is applied identically to each row?

(By enforcing I mean that there is _no_ way around that short of copying the data to new sheet)

Give me those two things (preferably within sheet context instead of table context) and a decent version control and I can reconsider that I do anything but disposable ad-hoc in excel again.

function_seven · on July 12, 2019

Unfortunately the answer is "no" to both #1 and #2.

It does do something close to those. Column data types are automatically carried over to the next row when inserting or appending rows to an existing table. But you're free to override it if you want for a particular cell

Same with formulas. When you enter a formula in one cell, Excel will automatically copy across the whole column. But you're able to undo that auto copy if you really just want the formula in that one cell. It'll also flag cells that have inconsistent (compared to rest of table) formulas.

So, no, it doesn't enforce. But it does encourage.

LeonB · on July 12, 2019

The warning you receive when a formula is not consistent is a nice feature, and an advanced user will quickly learn to be very sensitive to those warnings, will spot them quickly, and avoid causing them when possible.

To strictly enforce that the formula must be the same in each cell of the column would not be very excel-like, I can't see it being done.

beefield · on July 12, 2019

> advanced user

Yep, the problem is that I have this conspiracy theory that Microsoft has designed Excel to be the ultimate booster of Dunning-Krueger effect so that most people think they are advanced users [1] while they actually have no clue what they are doing. All the while giving no protection whatsoever against those Dunning-Krueger cases.

[1] I am very much afraid I belong to this group.

viraptor · on July 12, 2019

1. Not enforced as far as I can tell. But I believe the formatting can be set per-column only, not per-cell inside the table.

2. It's a default formula that gets repeated, but you may be able to override.

viraptor · on July 12, 2019

Totally. I know about it. The problem is that approximately nobody using excel does. (Yes, finance, data processing, analysts etc. do. Compared to the numbers with excel installed - still ~nobody.)

sheetjs · on July 12, 2019

We (https://sheetjs.com/) almost exclusively deal with those types of complex applications built in spreadsheets

> they could be improved a lot with minimal changes

They can be improved for very specific use cases at the expense of others.

> Making them more database-like and making table data first-class (at least you can make named tables in excel on windows) could be used to push people a bit more towards organised data, without changing how anything works.

There are "database functions" like DAVERAGE, PivotTables and other systems for dealing with more structured data. Getting users to use them is a challenge outside of the scope of the tool. "You can lead a horse to water but you can't make it drink"

> Half of the mess that makes excel hell comes from the fact it's too easy to put two tables of data + some random constants on a single sheet and refer to them by H3.

Half of the reason Excel is so popular is the loose structure and the power that it enables.

> Now it's hard to add more data, hard to move anything, and hard to create space which expands to the next row with existing formulas.

Most of the hairy workbooks start as a solution to a problem at hand and eventually accumulate cruft after people try expanding it, not too dissimilar to other forms of software development.

> Airtable (and Access) implements this idea, but unfortunately sacrifices the generic, free-form spreadsheet along the way.

You either enforce the constraints and weaken the platform, or you give users the power to do what they want.

viraptor · on July 12, 2019

> You either enforce the constraints and weaken the platform, or you give users the power to do what they want.

I disagree with that. All the pieces are already there. There's nothing to weaken by education. But nothing tells the new users about them. They see the main screen and rarely ever know about multiple sheets. I've seen people using excel at work for years without knowing that. You don't have to take anything away from them, just go: hey, did you know there's a better way?

I know it's possible because I introduced a few people to named ranges, sheets, and data tables. All were happy to apply them later. (On their own)

chaostheory · on July 12, 2019

You're both right. From experience, some people are happy to learn new things. Others, as sheetsjs pointed out, are not interested.

DonHopkins · on July 13, 2019

Clippy was a "Microsoft Agent" ActiveX component, that (or rather, who) you could script!

https://en.wikipedia.org/wiki/Microsoft_Agent

https://docs.microsoft.com/en-us/windows/win32/lwef/programm...

Clippy, Genie, Plany, Peedy the Parrot, Merlin the Wizard, Milton the Bear, Oscar the Cat, Max the Search Doggie, and all of their other happy friends were cheerfully retired in 2009, and are now merrily frolicking on a nice farm in the country side with a very loving family. Or so we are told.

http://bellcraft.com/mash/chars.aspx

mamcx · on July 12, 2019

My endgame is build a Excel+Access alike dev tool. I'm working in a relational language as the foundation (http://tablam.org) because Excel have another problem: The excel UI paradigm (reactive) is not the VBA macros one (almost OO). So make new functions not follow the way of excel.

I think too a spreadsheet is an interface for a database/source and must not have issues handling gigabytes of data. So I think internally it must a sqlite db for example and put on top the control. Making it virtual and reactive and is done.

pbowyer · on July 12, 2019

> at least you can make named tables in excel on windows

I built a spreadsheet that loaded external data sources (filtered log files from a production system) into Excel tables. I then combined two log files into another table, and created a pivot table from this, which I then filtered (date ranges etc) and analysed.

That worked great at the start - until I came to update the external data and found the pivot table didn't update. Then when I force an update it lost my groupings, filtering etc.

Any idea where I went wrong?

deburo · on July 12, 2019

I suggest you use Power Query to load external data. If you do, pivot tables and tables connected to those queries should all update automatically on refresh.

Power Query is really the best feature in Excel to me, eventually users will stop copy pasting from Access to Excel and I'll be happy.

jon-wood · on July 12, 2019

It had its problems, but Apple's Numbers product got this right I feel - it was a kind of blend between desktop publishing and spreadsheets, where you could scatter arbitrary tables of data around a page and then have them refer to each other. I suspect a lot of people who are throwing multiple tables of data in the same sheet are doing so because they want both of them to be visible at once.

kinj28 · on July 12, 2019

In my opinion - "No Code" tools will fix this gap to build real world apps yet being as simple as excel.

While airtable is just a DB as a service, a real no code tool would have ideally all the basic building blocks:

a. DB as a service (like Airtable) b. BPM capability -> to add workflows to the data c. Ui Building to abstract data d. API integration (may not advanced learning for power users)

collyw · on July 12, 2019

Excel is the closest we have come to no code tools and its been around for years.

Ms Access actually has a fairly decent way to build apps without really touching code, but it requires understanding relational databases enough to use it properly. Most (non technical) people end up using excel as a database instead.

gmueckl · on July 12, 2019

I haven't looked at Access lately, but when I was forced to use it extensively, there were limits tonhownfar you could push input validation without writing VBA. So I suspect that most Access databases have severe issues with data consistency. But at least the data is structured.

akavi · on July 12, 2019

Airtable has blocks for c) and exposes an API for d)

Not sure what you specifically want for b)

skrebbel · on July 12, 2019

> the fact it's too easy to

I'm not sure you fully appreciate why Excel is so successful.

viraptor · on July 12, 2019

No, I fully understand that this is great for many users and that it cannot be made harder in any way to make whatever mess you want on a single sheet - or excel would die. I only want it easier to not do that. (There's nothing guiding people to better design right now)

etatoby · on July 12, 2019

That's by design. People don't want to be "guided to better design," they want to get their math done and over with, leaving some trace (the Excel sheet) for the next time they or somebody else needs to revise that math.

TuringTest · on July 12, 2019

> I only want it easier to not do that.

I'd rather want it to stay easy, but offer better refactor tools to clean up the model after the fact.

My main complaint against spreadsheets is that, to get a robust and clean model, you must plan from it from the start, negating the main value of using a spreadsheet to begin with. When moving data and formulas around, it's too easy to break the formulas and not even notice.

dgaudet · on July 12, 2019

named ranges and DGET "sql-like" queries are in google sheets as well. (or perhaps what you meant is that few people know about these things, and don't use them, so you end up with unreadable messy code...)

viraptor · on July 12, 2019

Yeah, I meant that they're not right in your face and ready to be used. I know it's hard to force people to use new methods, but if anyone could do it, MS / Excel could.

It's all possible though, if you know where to look (I think excel even has table sheets now?)

davedx · on July 12, 2019

I see what you're saying, but this is actually a deliberate usability principle: make the basic features easy to access (visible without clicking into a menu), and put advanced features inside menus. Then basic users who don't need more advanced features can do what they need without getting lost in a super complex UI, and power users can still get to the advanced stuff.

If people want to do advanced stuff with software, they really do need to invest the time to learn where the advanced features of the software are. I think this is reasonable.

elalaouifaris · on July 12, 2019

I worked with excel for some years early in my career. I did not know how to program back then beyond if else and for loops.

We used it for budget estimation at multi million / up to 100 m. I have seen other uses in Corp world in many departments.

IMHO I think it’s good for specific use cases. But it is routinely abused beyond that.

The key shortcomings for excel uses by non-programmers for key business applications:

- How do you test that your calculation is correct after update. The same argument holds for business applications without tests?

- how do you maintain the knowledge of the inner workings without having to follow the arrows around cells. Probably there are good practices but how likely are you to ensure them in your product / team.

- the data / computation is spread in user space. In case your app is useful many people would like to use it. How do you manage the updates and bug fixes beyond it’s the user’s Responsibility

- how do you avoid the black box effect: mission critical software that no one knows how to touch inside without a full rewrite?

- how do you convince stakeholders that they need to migrate to an adapted solution while they have a working one now and often are oblivious to the hidden costs of users copy pasting data / reformatting for hours sometimes to fit an existing tool that is no longer adapted.

The data frame structure in R or python solves a large portion of these issues. Yes you need training. It’s the same for excel if you want to avoid in inferno machine case you need the same concepts.

Why not train for programming with data frames and have the option to gradually extend to web app without a full it project that needs 3 levels of validation.

ubermonkey · on July 12, 2019

Data frames are going to be huge. This year I started using them for a personal project -- python, jupyter notebooks, and pandas -- and was astonished at how easy it was to get running, and how powerful.

I feel like lots of number people, once exposed to it, will realize that there's a place to go after Excel that's worthwhile.

ubermonkey · on July 12, 2019

There's nothing WRONG with the spreadsheet model, and never has been. It's immensely powerful and useful, and the growth in capability over the life of a product like Excel is remarkable and enviable. Excel is perhaps Microsoft's greatest product.

The problem with it -- and there IS a problem -- is really a problem of applicability. Excel, like Lotus before it, is the first place many people encounter the ability to create their own logical conditions.

For a huge subset of these people, it's also the LAST such tool they learn. And now they have a hammer, so everything is a nail. I don't just mean finance people who live and die by spreadsheets; I've met engineers who would solve problems in Excel macros that would've been better attacked in perl or Python.

The other "problem" is the degree to which a horrifying number of organizations end up depending on very, very complex spreadsheets full of arcane and undocumented formulas and macros, and for which change control is "save-as."

But, again, none of this is a problem with the product itself, or with the idea of spreadsheets.

brootstrap · on July 12, 2019

Yeah i agree with you. Excel is very powerful to some degree but like all things it can be abused.

somewhere in a cube farm i can hear the shouts...

"Hey Karen did you get the latest budget sheet?"

"Is it budget_final_v2_2019.xlsx?"

"Damnit Karen that is from last week dear god dont tell me you sent that to corporate!! We are on budget_final_FINAL_v4.xlsx"

tvanantwerp · on July 12, 2019

I've been showing a team of economists working on some economic modeling how to use git and R. They were amazed that they could keep working on a file without creating backup copies, or worrying about team mates overwriting their work. It's a work in progress, but they are eager to learn and understand the value there.

ubermonkey · on July 12, 2019

We see this A LOT in our implementations.

We sell & implement a project management financial metrics tool (supporting earned value analyss; google if curious). A key input is always the actual costs of work done, which has to come from the financial system of record.

A horrifying amount of the time, the actual path is through goofy undocumented Excel sheets, because nobody knows how to extract it natively from $FinSys.

Shakahs · on July 12, 2019

I think there is a tipping point in complexity where Excel needs to be replaced by something else. In the IT world I've seen businesses that have essentially created bespoke ERP applications for themselves by evolving a web of interlinked Excel files over years.

A food distributor I worked for was handling their upstream vendors, warehouse inventories, truck inventories, customer list, order history, deliveries, invoicing, and more this way. These businesses suffer because the "Excel ERP" is constantly breaking or losing data, and no one knows how it really works anymore.

The food distributor eventually migrated to an actual ERP made for their industry, and the amount of effort and stress saved at all levels of the company was massive.

mdorazio · on July 12, 2019

The sheer number of everyday business use cases that can be solved quickly and efficiently with spreadsheets is pretty astounding. Of course, there comes a point when heavily-used spreadsheets become intricate behemoths on which core business functions are run, and then things can get bad in a hurry. The challenge many programmers face is thinking that every business problem is best solved with code, when in reality the majority are solved faster, more efficiently, and more maintainably with a spreadsheet. Then the rest are worth developing "real" solutions for.

Xelbair · on July 12, 2019

one of my first 'real' programming tasks involved replacing excel sheet with simple application.

Bcause excel sheet was doing a lookup in HUGE(i am talking about few million rows of raw data) dataset - basically a fulltext search in a database with unstructured address data.

The idea was to fuzzy join two datasets, and to make manual joining a bit easier with search functionality.

It worked fine on dev's machine, but on user's toasters it took an hour to load.

chaostheory · on July 12, 2019

> The challenge many programmers face is thinking that every business problem is best solved with code, when in reality the majority are solved faster, more efficiently, and more maintainably with a spreadsheet.

It's not like you can't do both. Excel supported VBA macros for a long time. Now it supports JavaScript.

froindt · on July 12, 2019

It still supports VBA. I've seen multiple manufacturing facilities which are absolutely reliant on VBA macros to be able to produce and ship their products.

6gvONxR4sf7o · on July 12, 2019

For the last few years I've believed Excel is the most important piece of software written, no hyperbole. The reason is that it's so damned accessible. Software is eating the world, and letting people who don't write "true" code still kinda write code opens up software's productivity and automation benefits for a bunch of use cases and an insane amount of people.

For programmers, the analogy is python. It's almost never the best tool for any specific thing, but it's a really solid tool for a ton of things. It's easy enough to learn and has enough depth to keep learning. It's easy to prototype a quick answer, and can not-to-painfully grow to a large complex system. I think that's why you get people from all backgrounds using python. It enables people who don't know much about software to start writing software.

Excel's the same way, but with an even lower barrier (and probably lower ceiling). It enables people who don't know anything about software to get many of software's benefits. That's huge.

Just like python can great for a biologist whose focus is biology not software, Excel is great for the accountant whose focus is accounting, not software. It makes them a programmer, or at least close enough for many many purposes.

mikekchar · on July 12, 2019

My biggest problem with Excel as a "solution" is that document control is kind of an afterthought. You've got X number of people in your organisation and they each have these adhoc collections of Excel spreadsheets. Usually the devil is in the details and you'll have one person who has calculated gross revenue one way and another person who has done it another way. The two approaches are never reconciled, but their output will invariably be compared as if they are discussing apples and apples. I've seen so much analysis done that was really just garbage because nobody understands where the data came from and what it means.

It's not really fair to blame Excel -- it's a calculation tool. However your solution space needs to address this problem and I very, very rarely see it happen.

incongruity · on July 12, 2019

It’s absolutely fair to blame excel. It obfuscates the code in favor of displaying the calculated values - it’s designed to hide the code - which, along with a number of other design decisions, makes it really hard to build an error free spreadsheet beyond trivial sized cases.

Further, its design makes reproducible data practices difficult - in contrast to R or Python which do a lot to separate code from data - and let you re-run the same code on new/updated data. Python and R (and other non-spreadsheet tools) encourage practices that make keeping raw data pristine with work being done on copies of the data. In contrast, it’s really easy to make mistakes with Excel in ways you’ll never catch. Sorting within filtered columns is a good example. Did you add another column after creating the auto filter? Surprise, data in that column won’t sort with all the other data when you use re-sort one of the original columns. Just like that, poof, silent data corruption with no easy way of reverting if the error isn’t caught quickly.

schnitzelstoat · on July 12, 2019

What I hate about excel is if you now need to do a very similar task, again, it can involve a lot of messing about with the GUI.

Versus changing a few parameters in the Python script.

mikekchar · on July 12, 2019

Ironically org mode in Emacs would be better in that respect :-)

jacobush · on July 12, 2019

I almost cracked the org-mode code. Almost. Used it for some sheets but now I only use it for my journalling. Hopefully it will keep the embers hot until I decide to pick it up for real again.

taneq · on July 12, 2019

I feel the same way about Access, too. It doesn't matter whether they're "real programming", it matters that they let real people solve real world problems for themselves, who otherwise would have to pay real programmers real money to develop real applications.

sbuttgereit · on July 12, 2019

I'm in that category of people the regularly have to unwind the web of Excel or deconstruct the Access databases that grow to rival SAP in terms of intricacy...

... I actually agree with the overall premise that these tools do let non-tech types do real work and I think department power user types should have these things.

But we must also recognize that these tools are a bit like a black hole, too. Not unlike many applications created by professional developers, these users will tend to continually add features and bells and whistles until they pass the event horizon that exists between good judgement and bending these tools to be what they aren't (the most common is turning Excel into a database). On one side of that horizon you can pull back and reasonably find better technology to implement sophisticated capabilities and on the other you start to reduce efficiency as results as the errors and issues with the misuse of these systems overwhelming the benefits. Worst part is... you never really know when you've passed through that horizon... until you're spaghettified that is...

So in-house technologists need to be aware of these things and give good support, and then also provide the technical judgement on when these approaches start to break.

davedx · on July 12, 2019

This is a good take.

I once build a Rails app that translated a giant Google Sheets document into a web application.

What I found really fascinating was that:

95% of my development time was spent building CRUD, access control, UI that was already more-or-less provided for free in Sheets

5% of my development time was spent writing unit tests and services that implemented the actual logic and formulae in the spreadsheet. Even though this was the tricky, business critical "thinking carefully" part, it was also the least time consuming.

It just goes to show, even compared with Rails, applications like Excel and Sheets give you a LOT for free, out of the box, accessible to everyone.