I work in critical infrastructure planning. My organization builds software in R, Python, and other programming languages customized for these major organizations.
So many critical infrastructures, billions of dollars in planning, and just systems are built out of Excel. It's amazing. You'd assume something that services millions of people a day would have some more sophisticated and customized solution, but you're wrong.
The reason is because most people know how to use Excel. Most people know how to handle it, use it, modify it, build up from it. You don't need to get a regular support contract from a company for your custom-built spreadsheets. If something critical in that "Excel software stack" breaks (aka Microsoft Excel), most of the time you just need to reset Excel. It's amazing. You don't need a new server, you don't need support contracts, it's just like riding a bike.
In my opinion, the limitation of Excel is actually when it comes to big-data analytics. The way Excel handles large quantities of data is slow (due to the nature of it's software structure). That's where we come in and build out these models and systems. However, in the end, our data outputs will be fed back into Excel, because that's what most people are used to.
I have deep respect for Excel. After all, Excel empowers so many users who don't know how to code to provide amazing plots and perform major calculations with ease.
Yeah it works and you don't need support contracts and then:
- Fidelity's "Minus Sign Mistake": loss of $1.3 billion
- TransAlta "Clerical Error": loss of $24 million
- Fannie Mae "Honest mistake": loss of $1.3 billion
Then you get employee turn over where new employees don't get "arcane" knowledge passed down by people who left and took their spreadsheet foo with them.
Excel does not have "access control", "auditing", "change tracking". Just getting work done is not enough.
> - Fidelity's "Minus Sign Mistake": loss of $1.3 billion - TransAlta "Clerical Error": loss of $24 million - Fannie Mae "Honest mistake": loss of $1.3 billion
Weigh that against <insert any company> makes $X Billion due to correct usage of Excel multiplied thousands and thousands of times over. Survivorship bias at it's finest.
Excel isn't perfect and has notable downsides, but work gets done all of the time on them for the aforementioned reasons.
I’m not convinced there’s a such thing as “correct usage of excel.”
What a lot of companies do is just scare entry level employees into being very careful. My friend’s girlfriend works at a place like this and people just accept all the problems with manually editing large spreadsheets because “that’s just life.”
Anything in excel is a hack and most of its users don’t know any better.
The manager goes to the people, who really know how to make a report (maybe that is "accounting"?) and asks them to make the report taking into account some single report specific things that they talk about. Then those people, who know how to do it deliver. They might have someone in their team, who knows how to work with dataframes, which would come in handy. Those people store the program written to generate the report from the data and are ready to modify when more is requested. Instead of doing code in Excel, they simply write their code in whatever language they use to process spreadsheets or other data sources.
If they have knowledgeable people there, it will take no longer than coding something up in Excel and will be more reliable.
I guess no one has used Power BI for reporting... Link the data source, save the view that "Manager A" wants every week or month, setup on premesis gateway, and your new reports are there before your are asked for them again. Always up to date. Then share your dashboard.
Also, you guys should be ashamed of yourselves. "not sure who does the reports, maybe finance?" Eceryone should know how to make their own reports.
If you regularly produce reports one would suppose that you would have an existing project template and know how that would allow a reasonable individual to deliver a result that isn't all wrong in a reasonable length of time not THAT dissimilar from the time to throw up a spreadsheet.
This also wouldn't silo the data in the users laptop, subject it to being destroyed because the user dropped it, and limit frequency of updates and access to how fast the user can respond to emails.
And meanwhile the manager that broke the rules, connected their personal copy of excel straight into the database and made the company 8 figures in profit over the course of the last year just outdid you and your team of report programmers.
And meanwhile an actual data scientist looked at the data and outdid the manager by a factor of 10, by creating a more meaningful report, that is based on an actual statistically sound basis, preventing a disaster.
Logically most of the profit making activities may be enabled by tech it usually isn't created by tech. The managers rule breaking activities are probably unrelated to his success.
In my experience, business success is often hinged on the reports and the insights they provide. Without accurate numbers, you've got nothing to go on.
I think it will be easier to solve those problems in the Excel development space than it will be to export the Excel solutions to another solution domain.
Also the kind of mistakes you've described can (and do) happen in other programming domains that are deemed more respectable.
Excel is hamstrung by having to maintain backwards compatibility for an endless number of hacks.
For instance, it is notorious that
0.1 + 0.2 != 0.3
in binary exponent floating point math. Excel does funny stuff with number formatting that hides this, but it is like having a bubble under a plastic sheet that moves someplace else when you push on it -- numeric strangeness appears in different places.
The right answer is to go to decimal exponent floating point math, but that is only HW accelerated on IBM Mainframes, maybe on RISC-V at some point. You'll probably crash Excel if you have enough numbers in it for performance to matter, but Microsoft would be afraid of any performance regression and it would break people's sheets so it won't happen.
On a technical basis we could use an Excel replacement that has some characteristics of Excel, and other characteristics of programming languages; one old software package to look to for inspiration is
What makes it almost impossible to do on a marketing basis is that Excel is bundled into Microsoft Office so if you have an Office subscription you have Word, Powerpoint, Excel, Access, etc.
GP was about how Excel lacks "access control", "auditing", "change tracking". Then OP says it'll be easier to add those things to Excel than to make non-Excel suit the purpose.
Then you say no, backwards compatibility problems. But how does Excel's need for backward compatibility make it hard to add "access control", "auditing", or "change tracking"?
"access control" by definition means "make things not work some of the time"
Excel has change tracking, but like Jupyter notebooks and similar products it doesn't make the clean distinction between code and data that is necessary for it to be useful. (e.g. if I develop an analysis pipeline and use it for May 2019 it should be as easy as falling off a log to run it for June 2019)
"At its heart, VisiCalc is about numbers. One of the early decisions we made was to use decimal arithmetic so that the errors would be the same one that an accountant would see using a decimal calculator. In retrospect this was a bad decision because people turn out to not care and it made calculations much slower than they would have been in binary."
That sounds like you want to rebuild the whole paradigm of how 99% of software world treats numbers (which may be, at the end, a right thing to do but unlikely to happen at least in the near term) and you are laying the fault of it not happening at Excel's door. That would be very misplaced blame.
You don't need Access Control because it's just an excel file and whoever has the file has the planning model. In most organizations like this, there's only like 5 or 8 people who all work together on a team, so it's not like 50-some people who are separated. Excel is a tool, a tool that you use to get things done. I mean even programming solutions have issues such as Knight Capital Group's $460 million loss due to mistakes in their code and deployment processes. Planning modules does not mean Operational modules. It's a big difference/jump between the two.
Again, these are all tools. What's important is how you use it and how do you properly validate these calculations. the thing you also need to understand is that no critical infrastructure planning is perfectly to-the-dot numbers. Our systems are far too complex, built off of human operational decisions, and have so many unknown losses that planning modules are based on scenarios. Also we're talking about thousands of these agencies around the world and their critical infrastructure planning teams are mostly 5 to 8 people. The current agency I'm working with has hired around 3000 employees to operate the systems but only 5 or so are really in-the-deep running these models and building these plans.
These people are not programmers. They spent their time learning about resource planning, mathematics, operation theory, physics, and engineering. Excel is an incredibly user-friendly tool that lets them automate a tremendous amount of their tasks. They're all very intelligent and can definitely learn how to code, but that takes their time away from more critical skills and tasks they need to accomplish.
The most important solution to the problems you've stated is the workflow. You setup a proper workflow, and risks of these problems should be minimized.
>Then you get employee turn over where new employees don't get "arcane" knowledge passed down by people who left and took their spreadsheet foo with them.
Yeah, this never happens with a proprietary codebase!
Who needs historical knowledge when you can just rewrite the program every few years in a different environment with all new bugs? (and a smattering of the old ones, too)
Few work things have bemused me more in the last couple of years than watching someone re-implement a slow SQL process in an HDFS big data platform forgetting all of the things that went wrong the first time, fixing them again, and then running into weeks worth of new bugs just to get not time critical data faster.
Fidelity didn't lose $1.3B, they miscalculated by $1.3B. The same with the other numbers you quote. Spreadsheets are error prone, but they still make sense for a lot of use cases.
The Fidelity "minus sign mistake" didn't create a loss.
The mistake was in relaying the information to the end user. It didn't actually cause a loss of that magnitude.
That is like saying if I mistyped in a word doc, that word created the loss.
Arcane knowledge will be an issue with magical spreadsheets, as it is with any software (and I could argue that with custom software you have this legacy knowledge over both code AND ui, which is worse than just a spreadsheet).
The last paragraph is key, though - auditing and access control are tough ones for excel, and there are many tasks that require this functionality.
The trade-off is customizability - any excel can be changed to your specific need whereas that nice and shiny enterprise software you build may lack one or two things which you will never be able to change.
> Excel does not have "access control", "auditing", "change tracking". Just getting work done is not enough.
Doesn’t the web version of Excel already solve that? I’m not familiar with it but Google Sheets does these things, at least to some extent - I’d expect Office 365 to do so as well.
While I haven't encountered localized names, localized formats make Excel an absolute pain for me.
My language uses the "European" number format of a comma for decimals and periods for thousands separators. Excel tries to adjust to that by using semicolons for argument separators (i.e. ADD(1.5, 3.5) -> ADD(1,5; 3,5)).
The problem is that their locale detection is wildly inconsistent and there isn't a good way to override it without changing system settings. When moving a file between computers, this is an utter disaster.
Excel goes one step further, and assumes that users in some locales (e.g., Dutch) want semicolons as separators in every CSV file you open in Excel.
I don't care (I just use LibreOffice Calc, which accepts any delimited values file just fine, and just asks which separator to assume), but it means that when you develop an option for users to download some statistical data as comma-separated values file (which is easy to generate), that you now have a problem if that user wants to open it in Excel (which makes sense for CSV) and that user is using one of Excel's broken locales (e.g., Dutch) where comma-separated values are not understood (again, it wants semicolons in Dutch).
I.e., the expected file format changes because of the locale Excel uses! So where someone in the US can just double click the CSV file (not to mention anyone using LibreOffice/OpenOffice Calc anywhere in the world), someone using Excel in the Netherlands will just get a spreadsheet where every line consists of that whole line of values stuffed into column A.
And that's not even the worst. You can teach people the few buttons necessary to read in the text, but once they save it again, the file is changed! Now you have a different separator and chances are good, excel also mangled date columns, removed leading zeros from numbers and switched out decimal separators! So you may not be able to further process the file. And its also sad, because if they just made some of these changes optional, excel would be a good tool to quickly work on an csv file and pass it on. But as is, I am always hesitant to work with csv in excel, because it mostly breaks and becomes unusable for further processing.
Oh I didn't know that is a localization thing. I use the german version and got used to open csv's in notepad first (search and replace ; with ,).
I always thought that's some kind of the usual MS vs. the rest of the world thing.
You can specify what separator to use as the first line in the CSV file:
sep=,
My experience is that Excel 2007 and later will correctly parse the file and use the specified separator. However, other software, such as Google Sheets, will simply render the declaration as-is.
Then there's the issue of what character encoding to use to encode the file, whether to include a byte-order-mark with UTF-8 to make Excel recognize the file as UTF-8 and the effect that has on whether the separator line is recognized (spoiler: it isn't).
Here's part of the documentation I wrote for Calcapp's CSV exporter, which digs into these issues in more detail (Javadoc):
/**
* The prologue of files containing comma-separated values (CSV). This
* prologue contains an instruction detailing the separator character that is
* used in the file. This instruction is known to be understood by Microsoft
* Excel 2007 and later versions, but is rendered as-is by other spreadsheets,
* including Google Sheets.
* <p>
* Microsoft Excel expects either a comma or a semicolon to separate values in
* CSV files, depending on the Windows locale. The only way to produce a CSV
* file that can be read by Excel regardless of what locale Windows is set to
* use is to use a prologue similar to this one, which explicitly tells Excel
* which separator is used.
*/
private static final String PROLOGUE = "sep=" + SEPARATOR_CHARACTER + "\n";
/**
* The character set used to encode files containing comma-separated values
* (CSV): UTF-16LE (UTF-16 for little-endian systems). Using UTF-16LE allows
* characters that cannot be represented by the ASCII character encoding to be
* correctly read by Microsoft Excel and other spreadsheets.
* <p>
* There is no way to formally specify the character set used by a CSV file.
* With one exception, Excel assumes that CSV files use the ASCII character
* encoding, unless the first three bytes consist of a byte-order mark, in
* which case the UTF-8 encoding is used. (A byte-order mark is redundant for
* UTF-8, as it does not depend on endianness, but is traditionally used by
* Microsoft Windows applications to detect whether a text file uses the UTF-8
* encoding.)
* <p>
* Excel only recognizes a file as being encoded with UTF-8 if a byte-order
* mark is included, but doing so prevents Excel from recognizing the
* information of the {@linkplain #PROLOGUE prologue}, which in turn prevents
* CSV files from being produced which work regardless of the locale Windows
* is set to use. This is likely due to a bug, present in Excel 2007 and
* likely later versions as well (based on anecdotal evidence).
* <p>
* Fortunately, Excel does recognize another character set which can encode
* all of Unicode: UTF-16. Excel likely uses heuristics to determine that a
* file is encoded using UTF-16. (Text written using Western languages and
* encoded using UTF-16 tend to include many null bytes for various reasons,
* making the detection of UTF-16 trivial, but only for Western languages.)
* Unfortunately, UTF-16 is dependent on endianness, meaning that it would be
* desirable to include a byte-order mark at the beginning of the file.
* However, that does not work due to the aforementioned bug.
* <p>
* In other words, using UTF-16LE should work well for CSV files containing
* mostly Western text and parsed on little-endian systems. It is probable,
* though, that files produced using this converter will not work if the text
* mostly contains Chinese, Japanese or Korean characters or if the file is
* parsed on a big-endian system.
*/
public static final Charset CHARACTER_SET;
If I'm going to mangle CSV into a non-standard form to conform to whatever Excel expects, I might as well go the extra mile and just provide an XLSX file.
That was my solution to a data-export function in an API (it switches on the HTTP Accept header). It now offers a choice of CSV (RFC 4180), ODS, and XLSX, and people can just pick XLSX if their Excel is using one of its broken locales.
It's not too hard to generate the XML for ODS (OpenOffice/LibreOffice etc.) and XLSX (Excel) from a bunch of tabular records once you've set up minimal empty template ODS/XLSX files to inject it in. It's certainly less work than explaining to Excel users that their software is broken and how to work around it.
+1000. This is ridiculous. Couldn't they at least make this optional so you'd be able to switch this off somewhere in the configuration dialog, registry or command line? I hate localized apps altogether (who even needs localized Visual Studio srsly? I can't believe anybody can be competent in a .Net programming language and relevant frameworks and practices without being able to read English) but localizning some apps makes some sense for some people, nevertheless localizing functions feels beyond reason. I feel thankful they didn't localize C# and command line commands :-)
The worst of it is that there are some very specific (and undocumented) edge cases where the translation won't be done properly.
It create very 'fun' to debug issues where a spreadsheet would work in a French Excel and not in an English one.
(For instance if you use the "Row-Column" cell reference, in English it is `R1C1` while in French `L1C1` - and it won't translate)
Even worse, the token used to separate arguments is localized. So you may have to use semicolons instead of commas. But in some places you need to pass a string representation of a formula and there Excel will only understand the non-localized commas. Fun all around!
I once had a contracting gig where I was tasked with updating a bunch of Word BASIC macros (this was Office 95 days...) used to maintain the ISO 9001 documentation for a major company.
Problem was the original version was written in their Danish office, and whoever wrote it had handed it to the Norwegian office exported as text before he left, and they'd imported it and started munging it and adding lots of stuff before someone bothered to try running it and realized it was completely broken. I wish I'd dug more into how they'd managed to get themselves into a situation where they'd significantly modified the code before trying to run it even once...
Version control? What is version control?
So they had this broken version that had some Danish keywords with a bunch of Norwegian updates, that they didn't want to just re-import into a Danish version and copy over properly in a tokenized form and having to try to identify and re-apply the Norwegian updates. So I got the fun job of properly translating it and figuring out what the new code was meant to do and make it do it in the process, with no access to any of the people who had worked on it.
[it is worth pointing out that this was an office for several hundred staff of a major international company that developed large scale information systems for things like police departments; while version control wasn't everywhere in the mid 90's, a large systems integrator certainly ought to be using it... Particularly amusing that it was their ISO 9001 documentation that was being handled in such a haphazard manner; happy to have never been a customer of theirs, though]
The task was extra "fun" for certain values of fun, because unlike, say, English vs Norwegian, or English vs Danish, Danish and Norwegian are close enough that 80% of the time a term from the Danish version might look right and be valid Norwegian, but the odds seemed to be (and maybe my memory is exaggerating it due to my lasting memory of extended pain) about 50/50 that the Norwegian translator of Word BASIC had chosen a different function name to the Danish translator for no good reason. Or there were slight, hard to spot, spelling differences. And sometimes what looked like a mistake was a function defined locally.
I spent a couple of weeks reading through the whole thing and mostly rewriting code that could have been written from scratch in a couple of days if they'd just told me what the end result was meant to be instead of dumping a pile of non-functioning code on me. But I guess I was cheap compared to their regular staff back then.
It was how I learned Word BASIC (I'd taken the contract, figuring it'd be easy to pick up, so I told the recruiter that sure I knew it, on the basis that I certainly knew a couple of variants of BASIC). Never to use it again (at it was replaced by a Visual Basic variant a few years later anyway)
On the same hardware, with the same table, Excel in Windows 7 does a cross-tab faster than MySQL in Ubuntu does. Indeed, MySQL chokes if there are more than a few hundred columns, but Excel just keeps going. And it uses all CPU cores.
Edit: In case anyone is wondering why I did that, I wanted a simple visualization of ping-location results for thousands of IPv4 from several hundred ping-probe locations. So that meant aggregating (getting minimum rtt for) millions of observations, and displaying min(rtt) in a IPv4 by probe location cross-tab. MySQL did a great job at the aggregation, but choked (as in, errored out) with several hundred numeric columns. Even if I converted them all to SMALLINT.
The other thing people don't realize about Excel is that it has a compressed, in-memory, columnstore analytics database built into it. It was called Power Pivot then Power Query, not sure if it changed names again, but it's actually a SQL Server Analysis Services Tabular Model, the same thing that's in Power BI as well. It's going to perform much better than a row-oriented RDBMS at aggregating columns because that's all it was designed to do.
It was a separate data processing engine within Excel developed by another team (SQL server) for the purposes of self service BI (Power Pivot/Query). Most probably, this engine is not used for normal (traditional) calculations.
You mean if you create the pivot table from data in a sheet? It's not needed for that because you'll run out of memory trying to fit any more than about a couple million rows in a sheet first. Meanwhile you can easily load 10s and possibly 100s of millions of rows into Power Query.
You're blowing my mind here a little bit. I don't use MySQL but do use Excel and the R/Python data science stacks quite a bit. I regularly open data with tens or even hundreds of thousands of columns in the latter by using lazy computations. Is this not the case with database approaches? What kind of hardware are we talking? Presumably Excel has to load everything into memory at once since it's immediately viewable, right?
In MySQL, maximum row size for non-text columns is 65,535 bytes. MySQL is limited to 4096 columns per table, and 1017 columns per table if the InnoDB engine is being used.
Excel's limits are 16384 columns and 1048576 rows.
I'm talking wimpy hardware here, I admit. Basically, VirtualBox VMs on a quad-core i5 box with SSD and 8GB RAM. With the VM having three cores and 6GB RAM. But it was the same wimpy hardware for Windows 7 and Ubuntu.
Notwithstanding the row length and column count limits, did you also give MySQL more memory? Setting InnoDB buffer pool size to 3 or 4 GB might help; its default is about 134 MB.
> Presumably Excel has to load everything into memory at once since it's immediately viewable, right?
Depends on the file format. An XLS or XLSB file can contain special markers for where each logical row starts, so it can randomly access rows; Both also can persist "calculation chains, which are a simplified dependency graph. The binary formats also store formulae in a parsed representation allowing easy scans to see what cells have to be inspected if a file needs to be recalculated.
Aha! As far as persisting calculations, this makes sense. I guess in my head I was thinking "pure" data where everything opened was precomputed. Forgive my ignorance about database solutions, but do they not implement something similar?
But now you've got me thinking, it would be nice if libraries like Dask could allow for flagging of symbolic operations like this to be written to disk for quickly saving metadata where intermediate steps don't explicitly need to be saved.
Might be worth a try to look into Postgres. In case of the column problem, PG has a limit of 8 Kilobytes for Column Width (instead of a count of columns). You can circumvent some of the limits by using arrays, types, json or hstore. It also has pl/Python which lets you code up queries containing python fragments for data processing.
Meanwhile <side rant> open any document in Photoshop with a few layers and effects and the computer grinds to a halt, no matter the specs, no matter the year, through the ages. New hardware comes out? Booya, new Photoshop XYZ -> let's put your fancy hardware on its knees, begging for air.
>Meanwhile <side rant> open any document in Photoshop with a few layers and effects and the computer grinds to a halt, no matter the specs, no matter the year, through the ages.
Hasn't been my experience ever. Photoshop is one of the speediest image manipulation programs out there...
The monthly fee for Photoshop is around 1/100th of the old retail price. That works out at about 8 years of use before you have to 'buy' the software again. Seems fair to me.
And for those who don't like paying for Photoshop - which, given it's an astonishingly powerful piece of software, probably means "people who don't actually need Photoshop" - there's always cheap or free alternatives that provide about half the functionality.
>The monthly fee for Photoshop is around 1/100th of the old retail price. That works out at about 8 years of use before you have to 'buy' the software again. Seems fair to me.
That's because you only think of yourself. Creatives are one of the most struggling professions, and can have widely different returns per year, and salaries in different parts of the world can be much lower.
People who could afford to buy Photoshop at some point (perhaps even a used copy), later down the road might not be able to pay the subscription for a few months, but in the new scheme they lose access to the program altogether.
8 years? There are creatives that use 10 and 20 year old versions of Photoshop on Windows.
You pay per year. $120 gets you access to the photo bundle for one year. That's 30x cups of Starbucks drip coffee a year. Creative people who are 'struggling' blow more than that a month on weed, beer, coffee or Tinder. We all make choices that show what we prioritize.
Anyway, the subscription model for this kind of software works really well. Software keeps becoming better and so far Photoshop has been running circles around the limpy thing known as Gimp
P.S. I just got hit with the next year charge. I was annoyed so I tried installing CS 6. It's OK. It still beats Gimp but I would rather not drink coffee or smoke weed for a month to have money to pay for the yearly subscription for the photo bundle.
P.P.S. And if you are really really really struggling "creative" you are probably young and either go to school or have a friend that goes to school which can get you it under the EDU discount, which is peanuts.
> And for those who don't like paying for Photoshop - which, given it's an astonishingly powerful piece of software, probably means "people who don't actually need Photoshop" - there's always cheap or free alternatives that provide about half the functionality.
Now? Absolutely! 20 years ago this was much harder. Adobe used to dominate the market so that you used Photoshop even if you often didn't really need all its complexity. But these days, there is so much competition that you can probably find something that works well enough. I would assume that hurts their bottom line, but as a consumer, I am happy.
> Adobe used to dominate the market so that you used Photoshop even if you often didn't really need all its complexity.
Still does. Part of the reason is that in the past, they turned a blind eye on people pirating Photoshop. Every kid with even passing interest in graphics or photography got themselves a bootleg copy of Adobe tools to work on; it was an easy guess what they'll be using later, as adult professionals.
So true, I used to run European cross-docking ops, planning, inventory, forecasting, scheduling, in Excel back the day. Successfully scaled by 100% YoY for three years. Was quite a lot of fun.
Issue I have with that approach is not using excel per se. It is using Eycel in addition to what ever system is being used the first place (SAP for example is a pretty popular thing to circumvent with Excel sheets). That and over engineering spreadsheets to the point where nobody but the creator can use them anymore. At that point you data just gets suspect. And circumventing existing systems with local offline spreadsheets just screws up everything. Now combine these two.
But that has less to do with spreadsheets themselves, see my fist point regarding cross docking, and more with the application. Used correctly spreadsheets can be incredibly powerful. And at least for my purposes I have yet to encounter an analysis issue I failed to dig through using Excel. Sure, something like Python might have worked better but consider me to be an empowered user (read: I have no clue about Python or SQL or...). And at least I knew I could trust the way the analysis was done as intended.
> It is using Eycel in addition to what ever system is being used the first place (SAP for example is a pretty popular thing to circumvent with Excel sheets). (...) And circumventing existing systems with local offline spreadsheets just screws up everything.
I usually side with the circumventers on this one. This misbehavior happens for a reason, which usually is that it's impossible or infeasible to do the work with "proper" systems. Excel sheets on pendrives is what you get when people can't exchange data using "correct" software, or when people need to continuously iterate on the shape of the data set, while getting the schema changed in the "correct" system would involve a ticket to IT and 2 months of waiting.
I think instead of paying people for doing a job and then doing everything possible to make that job difficult, companies should embrace that people will work around any deficiencies of top-down systems, whether software or procedural. But I guess this tug-of-war exists since the time of first corporations.
True that. I used to both back then, first a user and then a SAP system guy. For me it is always a death spiral: people don't trust the system or it is too cumbersome, they use Excel, the system gets even less reliable, the use Excel even more,... and so on. I always saw it as my role a system guy to give users a system they can use and then force them to actually use it. Because, when providing the systemnI talked to them first so in the end it was ind of their system too.
And then there are examples like the production planning done in Excel because SAP PP isn't just good enough. Then the guy who created said Excel tool left. Years later when production had to switch to weekend shifts zhey couldn't because no one could adopt the Excel tool. Once you reach that point you are screwed.
Agree but slight modification - goal should always be for the users to be able to do their main functional tasks in-application and maintain it as the system of record.
But there will always be a use for Excel as a tool for informal data-monkeying and analysis - exporting a table and doing some pivots to summarize or use a different view. Non-technical users will always expect this capability and flock back to it. Not everyone's a programmer or has the interest to be - you'll have to pry excel out of most accountant's cold, dead hands for example
You describe the ideal world. Tools like SAP or Business Warehouse suck at data crunching and nor everyone can use Python for example. Even ops can under the right circumstances be okish offline in Excel as long as results are imported into e.g. SAP to assure data integrity.
I feel like PowerBI is a good solution to overcome Excel's deficiencies while working well with Excel. It forces you to structure your data and define measures while being dyanamic, efficient and friendly to less structured inputs.
I have to give PowerBI a serious shot. We have it at my current place but nibody is really using so I need some spare time to dive into it.
Now, the more I think about it, the combination of MS Office, Power BI and MS Dynamics with a shopfront from, say Shopify would make up a quite nice e-commerce set up.
I do something around the same line as you describe (build better tools in R/Python for larger data problems) but I have a deep aversion to Excel. It's proprietary, has horrible standards (it does not support native UTF8 csv files for example), and is stuck in the 80's in terms of paradigm. And this is precisely the tool that prevents people from doing things more efficiently because "they can do it manually in Excel even if it takes a long time". It makes people take really, really bad habits.
Excel is a tool. It empowers those who don't know how to code to do their jobs more effectively than before without "learning to code". It's something you keep in your garage in case you need it. It depends on what their current workflow is. Sometimes I'll use Excel, sometimes I'll use R/Python.
What's important is that Excel gives you access to the computing power of modern technology with a lower barrier-of-entry/knowledge. That's one of the most valuable things that many people fail to understand. Not everyone is a programmer and for a lack of a better term, "doesn't really care about UTF-8 and just want the calculations to work". Also, in many cases Enterprise IT in critical infrastructures are resistant to supporting R or Python due to their security models (firewall rules prevents install.packages('tidyverse') for example).
This is what makes us specialists and someone they bring on board to help them out and explain to them about what the "best-use-case" is, and then watch as people try to cram a square peg into a round hole.
Well if you're using Python, you're following an imperative/oop paradigm which goes back to the sixties, so I don't know why you'd criticize Excel for using a dataflow model. I can understand not wanting to use Excel because it's proprietary (but Libreoffice isn't) but I can can understand not wanting to deal with a lot of it's legacy cruft, but for the stuff that Excel does well, it does it a lot more quickly, more interactively, and more intuitively for most people than a similar solution in Python. In fact, one of the key benefits of an Excel solution is that you can see every step of your calculation which creates a degree of self documentation which you don't get with Python, for example.
> that you can see every step of your calculation which creates a degree of self documentation which you don't get with Python, for example.
If you want to do that in Python then Jupyter Lab/notebook is a good solution to document your code and see the result of each operation. And way faster than Excel for large data tables.
Think of all the open, modern, nicely standards compliant tools that you use. Now think of the fact that Excel lets people solve their problems more easily than all the things you like.
I'm not saying it's good, but it's better than everything else (for the people who use it.)
I am talking about importing CSV files in UTF8. If I use Japanese characters for example it fails utterly to display properly unless I import it as "Text" first.
The secret trick for importing data into Excel (without actually writing a file in the Excel file format): don't try CSV - as you have already noticed that will fail in all kind of fragile ways depending on your character encoding and operating system locale.
Instead, convert your data into an html table and then paste or import that into Excel.
But not any html table, you need to have the appropriate proprietary css properties in there, such as the mso-data-placement:same-cell. Those are actually documented ...in the Office HTML and XML Reference published in 1999.
I frequently go from SQL query results to HTML table to excel in a similar manner, but I haven't had issues w/ any proprietary CSS properties. Using Paste Special -> Match Destination Formatting it always comes out without any issues
But how do you handle sharing? The main issue with Excel, and its main downfall, is the sharing of data and versioning, with person X has an older version of person Y by a few hours and needs an email of person X to keep doing her job.
If it's O365 Excel, then you can share and work on the same spreadsheet in a similar way to Google Sheets (seeing other peoples real time cursor in your spreadsheet etc).
Be very, very careful with this. There are some update/race condition bugs in Excel Online that result in multiple users causing bulk updates to incorrectly update the filtered/sorted view of other users.
This race condition happens often. In fact about every time I try to rely on it with my team. Such a shame Google Drive isn't seem as a "trusted platform" in my client environment.
Sharing Excel sheets is extremely easy - you just mail them, or host them in some online place, like SharePoint if you've bought into the whole MS ecosystem like many companies do. Not sure what your point is there.
Versioning is of course more difficult, though Excel does support diffing in principle. I expect though that what most people end up doing is simply keeping track of versions manually, same as they would have in the days before excel.
Honestly, people handle sharing via Google Drive/Dropbox/Enterprise solution plus file naming convention.
The current enterprise network I'm on? They block Box, Dropbox, and a lot of other solutions. They instead use Citrix's ShareFile and a NAS setup with file naming.
Also remember, surprisingly most of these critical infrastructure planning isn't handled by hundreds of people, at most it's a team of 5 people who then present their findings and suggestions to the C-suite decision makers. It's not as big of a problem as many people think.
We tend to keep ours on Sharepoint - which means we could use Excel Online to work collaboratively, but usually we don't need them all at the same time, so open in Excel desktop and "check it out". If someone else needs it, they can just ping on Skype and see if you're done.
It is amazing, just not in a good way. Maybe I should try to find and post a link to that paper where it was pointed out that Excel was munging things like gene names because they get interpreted as dates in the 'untyped' cell input, and these things were showing up in published research.
Excel is a disaster, it will seem to work until it turns out it's doing something weird behind your back, or maybe you've made a careless mistake somewhere and it has zero tools to help you catch it. And you won't know until it's far too late. Please say no, use anything else - Python, JavaScript or heck even QBASIC or whatever - but just don't use Excel.
A friend of mine works converting spreadsheets to Python/Java programs. Basically people at the company solve the problem with Excel, and when that gets to slow, he codes it up to read from a CSV and they're off to the races.
It's amazing how far these business types can get before needing a programmer to optimize things. As a programmer, I tend to think in terms of logic, and then I tried to build a complicated budget tool in Python and the turnaround was just atrocious. So, I did it in a spreadsheet instead and was far more productive.
Spreadsheets are awesome, and whoever came up with the concept should get some kind of award. I can't think of a single software tool that has enabled more productivity than a spreadsheet.
And is it just me, or hopping on OpenOffice Calc or Google Sheet felt like a different device altogether: They all look like bikes, but accelerate differently, change gears differently etc?
I love working with spreadsheets! In a past life, I was all about taking data from various sources, processing it, and spitting out CSV's that I would then work on with a data analytics person using Excel. Curious if y'all use a similar approach of generating CSV's or do you go even deeper and actually create excel sheets with excel native stuff in it (like formulas).
What I specifically handle is Water. Starting from the natural system (weather generator + Hydrology) through to the human system (Demand model (showing increase over time and temperature increase) + system operations (dams, reservoirs, treatment facilities, pipes, etc.) to delivery to customers.
The big problem we answer is "How can we continue delivering water to people while under climate change uncertainty? What policies or new infrastructure can we build so we remain robust and resilient under these new conditions?"
So many critical infrastructures, billions of dollars in planning, and just systems are built out of Excel. It's amazing. You'd assume something that services millions of people a day would have some more sophisticated and customized solution, but you're wrong.
The reason is because most people know how to use Excel. Most people know how to handle it, use it, modify it, build up from it. You don't need to get a regular support contract from a company for your custom-built spreadsheets. If something critical in that "Excel software stack" breaks (aka Microsoft Excel), most of the time you just need to reset Excel. It's amazing. You don't need a new server, you don't need support contracts, it's just like riding a bike.
In my opinion, the limitation of Excel is actually when it comes to big-data analytics. The way Excel handles large quantities of data is slow (due to the nature of it's software structure). That's where we come in and build out these models and systems. However, in the end, our data outputs will be fed back into Excel, because that's what most people are used to.
I have deep respect for Excel. After all, Excel empowers so many users who don't know how to code to provide amazing plots and perform major calculations with ease.