Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Excel_to_code – Converts some Excel spreadsheets to C (github.com/tamc)
92 points by gmac on July 24, 2015 | hide | past | favorite | 59 comments


I've got a project in a similar veign here: https://github.com/donatj/XLS-XML-Formula-Expander

It's ancient and flawed but it converts a SpreadsheetML ( https://en.wikipedia.org/wiki/SpreadsheetML ) aka "Excel 2003 XML" spreadsheet to PHP in real time.

There is a VERY complicated configuration tool for a major steel company that runs on this still, last I checked anyway. I left that job ~5 years ago. I've done a few small updates since then for them, but not much.

It was nice though because it allowed the company to just submit a new Spreadsheet and their tool would automatically update. I was very proud of it at the time. Haven't had reason to maintain it though, but it works much faster than other tools that do similar for PHP.


As a developer (or someone with some scripting or DBA skills), it's pretty easy to scoff at Excel being used as a way to internalise business rules but the thing about it is that it's this ubiquitous language that a lot of people understand.

We've saved ourselves a lot of hassle implementing application integration solutions by outsourcing domain-specific number crunching/mapping to a spreadsheet in our integration platform. It enables us to treat the customers' business rules as a black box - feed inputs against named ranges and read out named ranges (https://developers.flowgear.net/kb/Node:Excel_Workbook).


How did you go about converting the Excel workbooks into a usable program which you can feed data into and get back computed results?


I think Joel Spolsky (?) is often quoted by saying that the easiest way to parse ms office file formats is to just install it on a server and use the documented API to Word, Excel,...

I used this approach regularly about 4 years ago in my previous job to interact, query,... with data stored in "excel databases".

So, while I'm not the parent commenter, my suggestion would be to have a windows server that runs both my interface software, and excel. You load the workbook, populate input fields, let excel do the processing and query output fields/workbooks/...

Of course the excel sheet could also contain VBA code actively pulling out data from a DB, heck it could even call code in custom shared libraries. But I doubt that passing around workbooks full of code amongst it illiterate users will do any good. The less magic they contain outside of stock cell formulas the less chance of users messing up the computation.

If you want to play with it, look up the win32 modules shipping directly with ActiveState's python distribution.


That's about how we do it. Excel (and its COM API) was never intended for multi-tenancy or consumption from a service. That said, if you run the service under a named user account (as opposed to SYSTEM or NETWORK) and make sure you won't have users accessing the file and locking it, you're good to go.

If you need to invoke the same sheet repeatedly, you can build a cache of range output values keyed on a hash of range input values. This enables you to bypass Excel for a state that you've previously seen.

It's also pretty easy to inject and execute script (as opposed to invoking existing scripts) - that deals with the problem of users being able to see or edit VBA.


> users accessing the file and locking it

I'd say that having such a file on a network location accessible by your users is a pretty bad idea. As always, a proper develop/test/release process should be in place, otherwise great confusion about the actual revision of a "Excel Program" in production will ensue with absolute certainty.

Treating it as a binary blob doesn't preclude one from implementing at least a basic form of revision control. You'll not be able to create useful diffs between revisions, but at least one can identify them.



I've got a system that has been working quite nicely. Basically it's a python server that interfaces with a headless instance of LibreOffice Calc. Spreadsheets are uploaded to a directory that is polled for changes. The python server opens/closes the spreadsheets in LibreOffice. The python client connects to the python server and can modify and read cells in any spreadsheets. Each spreadsheet is 'locked' when it is being accessed so to avoid two simultaneous modifications/requests giving incorrect results.

LibreOffice nowadays works pretty well with excel spreadsheets. It also means that as LibreOffice improves so does my excel sever. I'm also working in the GNU/Linux world so this is a great solution for me.


This sounds similar to a reporting system I wrote in about 1999 using Word automation API. It sat on a windows NT box and watched for changes on a netware share. That would load up a text file, read it, parse it and pick documents and fill in bookmarks in the documents then print it to a printer or PDF file.

Worked wonderfully apart from the memory leaks :)


In fact I always thought it should be a feature of Excel. Take a worksheet, define inputs and outputs, and make the worksheet a function which can then be reused to apply for instance to a table of data.

There are many things that could be improved in Excel. But Microsoft seems to have unstaffed all developpers from the office team and that product seems to be in maintenance mode. The only innovation in 10y is a button that enables to share a spreadsheet on facebook. Will be very popular in a corporate environment.


> For small spreadsheets this will take a minute or so. For large spreadsheets it is best to run it overnight.

Ah, Ruby.


Any insights why the conversion is so slow? Intuitively, converting a spreadsheet doesn't sound like it should be a high complexity task.


Where do you start? No seriously where would your program start and where would it finish? Most spreadsheets are a tangled mess of interacting cells.


Excel manages to process it in real time.


Yep for all its faults it's an impressive program.


In fairness to Excel, I think most of it's faults are more down to user misuse (eg people creating "databases" in Excel) rather than problems with the package itself.

Though Excel's handling of CSV can sometimes be a little troublesome - particularly with date / time fields. And I wish it support Open Document Spreadsheet (.ODS). But the former is a fringe case and the latter is obviously just a business decision. So, as you say, it's a very impressive program.


> The generated code uses floating point, rather than fully precise arithmetic, so results can differ slightly

Use with caution.


Excel already uses floating point internally, so if you're worried about compatibility, this is less risky than switching to arbitrary precision math.


Floating point tweaked to make it look like it uses decimal arithmetic. See http://www.eecs.berkeley.edu/~wkahan/Mindless.pdf, starting at page 3.


Wow, didn't realize that about excel (though I have observed it in google sheets). I wonder how many hedge funds rely on sheets with catastrophic cancellation issues.


I tried 0.1 + 0.2 in both google docs and excel. I got 0.3000000000000000000, not 0.30000000000000004, like you normally would using floating point.


I think there's internal rounding that hides it. Try A1=1e50, B1=A1+1, and C1=B1-A1.

Excel definitely used floating point in the past (because you used to be able to use it to test for the Pentium FDIV bug) and according to the documentation, it still does: https://support.microsoft.com/en-us/kb/78113


Rounding happens when the value is displayed. To see the raw value, try: =(0.1+0.2)


I expect all financial math is fixed point.


It's absolutely not. Everything I've seen has been 64-bit double. If you need to, you round either intermediate or final calcs to a set precision.

I once worked on replicating a particular index, and the engine used fixed point internally. In order to get the values to match with the ones published officially, we had to emulate Excel floating point: our fixed point values wouldn't line up with theirs.


I'm not sure why ppl downvoted you. But from what I've heard - never implemented myself - what you're saying is common... At least as far as using separate fixed point values to represent the left and right sides of the decimal place.

Now I'm sure for things like computing interest by raising things to a power or even computing percentages, etc, floating point is used (and eventually reduced to fixed point values), so maybe someone found it too absolute to say "all" financial math is fixed point? Anyway, if someone with experience can confirm or deny, it would be neat...


Because he is wrong? Excel uses 64 bit floats.


Really interesting project. The source code is a bit scary[1] but I suppose that's to be expected (machine) translating Excel to C. Good work.

[1] - https://i.imgur.com/q2KZRR1.png


unrelated but you should see the generated code that matlab/simulink outputs from their Realtime workshop. This reminds of that


Not really understanding why one would want this?


Because a lot (a lot) of companies have a staggering array of business logic sewn into spreadsheets. Eventually that might end up having performance issues, so transforming some of that logic to C is, at the very least, an interesting solution to a problem.

Everyone looks down on them, but spreadsheets really are incredibly functional, and a lot more accessible that most programming languages.


And let's be honest. If you ask an internal IT department to do in code the equivalent of what you do with a spreadsheet in a day, they will prioritize it, discuss specs for months, go away for another 6 months, come back with something that has nothing to do with what we are trying to achieve, then go away again. By the time it is up and running you probably don't need that stuff anymore. And of course you will never be able to update it. And if it stops working, IT will hide behind their ticket systems, which they seem to spend more time tweaking and upgrading than the applications they are supposed to care for.

Ahhh. Corporate IT. I may miss it one day. But not today.


Right, I get that... but why C? Presumably if you're using Excel you have access to the whole MSFT stack, so it just seems weird to use a Unix language when it seems likely that some VS/C# tool has already been built for this.

I know C/C++ works just fine on Windows, but the command-line tools on cmd.exe leave a lot to be desired.


One advantage that C has over most other languages is that it has a very simple and widely-supported ABI that can be embedded into virtually anything else under the sun, without additional dependencies.

That's especially valuable in this case because the kinds of organizations that have lots of business logic in Excel spreadsheets are unlikely to want to switch programming languages to take advantage of a tool like this. Given that automatically-generated code is likely to be ugly no matter what language it's written in, it makes sense to target the lowest common denominator that everyone can benefit from.


I think this might be geared at companies who want to convert Excel workflows to a POSIX based cloud solution. If you want to combine moving to the cloud (e.g. for multi-user) with reducing vendor lock-in, this makes some sense I think.


What makes C a Unix language?


Well, C was designed for the purpose of writing Unix.


Lots of answers to this one, including "It's not really", but the most commonly cited connection would probably be Dennis Ritchie.

He was half of the original Unix team, and most of the original C team.

There are also cross-overs in terms of "design ethos". (complex structures should be comprised of well-composed combinations of simple structures)


It's used in parts of engineering too. Matlab is more typical for complex models, but a reasonable number of models end up being implemented in Excel.


Also, in addition to the other responses, excel is great for quickly modeling problems/systems with lots of inputs and outputs and calculations in between e.g. discounted cash flow.


many business "workflows" still run on excel.

there might be some uses for non-technical users, but yeah... it's sort of madness to do this for a production service.


Nice work! I built something similar that keeps everything in Ruby. It was made to allow models built in Excel by a non-programmer to be executed on the back-end of a Rails app, but it's pretty generic: https://github.com/agworld/soroban

I had big plans for the project; the next step was to convert the models to JavsScript so they could be executed directly on the client. But, like most of my pet projects, this has remained a pipe dream.

Good luck with it!


Surely I'm not the only one that remembers someone doing this with Lotus 1-2-3 in the 80's...how far we've come. What's next, a database in a spreadsheet? Word processor? Modem application?


At my last company, someone wrote a format to encode RDF data in spreadsheets.

Google Sheets was our primary tool for configuration and content authoring for way too long. Eventually, he added support for OOXML sheets, too. At some point, a long time later, we started migrating to TTL, and thank goodness for that.

The code was all open-sourced, by the way, but I'd rather not link it here because I don't really want this username associated with that company.


> The code was all open-sourced, by the way, but I'd rather not link it here because I don't really want this username associated with that company.

Any search clues? It sounds really interesting and I've been exploring the possibility of using spreadsheets as a widely understood authoring tool.


I've no idea if it's related, but here is one I'm familiar with:

https://code.google.com/p/owlpopulous/


> What's next, a database in a spreadsheet?

You have no idea, poor innocent soul... :(. I hope you won't touch excel-based databases, it's really awful...


I've got the scars, 123, Symphony, Quattro, Paradox, Excel, Access, dBase II, III, IV, Lotus Notes.....OLE!


Add VisiCalc and TurboCalc to that list and I'll have to say "Yeah, me too!".


You might not believe this but my boss builds web app mock-ups using Excel. I was surprised to find that out but it works.


Excel is either brilliant or horrible (depending on your role in dealing with the output) at being re-purposed to do "the wrong thing".

I've had form templates sent to me in Excel because of the easy-to-use table layout controls. I've had games sent to me in Excel because VBA was so damn easy to program simple ideas in. I've had PoCs of algorithms sent to me in Excel because the graphical chaining of each step is so illustrative.

I've also seen the nightmare that haunts businesses who define their business logic through a bunch of interconnected Excel files. As an aside, this seems like an area ripe for "disruption" (remember when "disruption" was a bad word?), which is where tools like excel_to_code could probably make a large impact. If you could validate and test the entire business logic, as expressed in Excel files, you could reduce or eliminate the "Excel defined business rules" nightmare. If you can do this, there are thousands of SMEs that will be knocking at your door.

I personally don't use Excel for much, and I used to deride people who used it for "the wrong task", but I've recently had to swallow my pride, and accept that it's a serious power tool for most non-programmer people. I'd even go so far to say that it's the most successful "visual programming" tool out there. And by a long way.



>What's next, a database in a spreadsheet?

You seem to imply that's not already the case (though many wish it wasn't)


vlookup pseudo-joins ahoy !


I'll point out that if you find yourself with a RDBMS-like situation in Excel I highly recommend using Excel 2013+ 'tables' feature which adds foreign keys and better pseudo-joins (in pivot tables) inside of Excel. Better than anything hacked together with VLOOKUP. (heck, I've even seen some people hack it together with only SUMIF)


I use tables all the time in Excel 2010, haven't had much experience with 2013. Is this something that is truly new in 2013?


I don't know, I have no experience with 2010 so I went with the lower bound I can guarantee as opposed to potentially giving harmful wrong information. Can't edit my comment now.


I did something like this at work to convert these huge excel sheets to a compressed format. I normalize the excel xml files with an xslt then I extract of the meta data I store in named ranges.





Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: