Show HN: An app to split CSV into multiple files to avoid Excel's 1M row limit

prepend · on Oct 17, 2022

Why is this a yearly cost? I don’t understand.

There’s already open source utilities [0] for users who aren’t proficient in Unix commands.

If anything this should be the definition of a one time fee.

You’re free to charge whatever you like, but it seems odd that anyone would pay you year after year to use your app.

It’s not the price, as $40 isn’t that much, but the value and principle of the thing.

[0] https://github.com/philoushka/LargeFileSplitter

tanin · on Oct 17, 2022

The target users are non-technical users. Think accountants. They are technical and not be able to use command line tools.

Other than that, I appreciate the feedback.

ingenika · on Oct 17, 2022

The target users don't have that amount of data

believe me

I worked as an accountant

tanin · on Oct 17, 2022

It's true that most accountants won't be able to reach that kind of scale, though the modern saas businesses might be able to reach that point.

hackarama · on Oct 20, 2022

I work as a dev _for_ the accountants. We have that amount of data.

ergocoder · on Oct 21, 2022

You don't understand. They said "believe me". It must mean 100% of the accountants will never ever encounter 1M rows.

prepend · on Oct 18, 2022

First, every accountant I know has a lot of IT skills.

Second, why do accountants need to pay a yearly fee for such a basic utility?

Someone · on Oct 17, 2022

The video (why?) doesn’t load for me, so I can’t check features, but what’s wrong with split and csplit? (https://www.gnu.org/software/coreutils/manual/html_node/spli..., https://man.openbsd.org/split.1)

tanin · on Oct 17, 2022

Just random question: are you using iphone?

Someone · on Oct 17, 2022

tanin · on Oct 17, 2022

It seems iphone and ipad can't play mp4 for some reason. I may need to change it to gif instead.

Someone · on Oct 17, 2022

My “Why?” isn’t about the video not playing, it’s about using a video as almost the sole way to figure out what the product does.

tanin · on Oct 17, 2022

Great point. I suppose I could show some screenshots as well.

mattewong · on Oct 18, 2022

TL/DR: xsv is probably what you want, or maybe zsv and/or awk

awk can do this super easily. Here's an example snippet that not only shards, but compresses your shards. ``` (NR - 1) % shard_size == 0 { # ready to start a new shard current_n = current_n + 1 output_file = sprintf("%s%04d.%s.bz2", target_prefix, current_n, file_type) print "writing to " output_file > "/dev/stderr"

    # close any prior-opened output_command (else will err on too many open files)
    if(output_command != "")
        close(output_command)
    output_command = "bzip2 > " output_file

    # print header
    print headrow | output_command

}

NR != 1 { print $0 | output_command }

```

This of course assumes that each line is a single record, so you'll need some preprocessing if your CSV might contain embedded line-ends. For the preprocessing, you can use something like the `2tsv` or `select -e ...` command of https://github.com/liquidaty/zsv (disclaimer: I'm its author) to ensure each record is a single line.

You can also use something like `xsv split` (see https://lib.rs/crates/xsv) which frankly is probably your best option as of today (though zsv will be getting its own shard command soon)

tryithard · on Oct 17, 2022

https://man7.org/linux/man-pages/man1/split.1.html

sammyteee · on Oct 17, 2022

This site probably isn't your target audience....

excitednumber · on Oct 17, 2022

Unless you want competition and people to realize people would pay for such a thing

prepend · on Oct 18, 2022

I think this is a lesson in product market fit in that there is not a market for this app and it will adjust (or go away) to find their fit.

It would be fascinating if there ended up being a market for this as it’s so counterintuitive for me.

jcbages · on Oct 19, 2022

I think $40 a year just for the split feature is a little too much. However, adding more useful features for manipulating CSV files would probably change my mind about it. For example, doing some reorder or preprocess of the files as if people split into X files they'll have to do that action X times if they do it in Excel directly.

didgetmaster · on Oct 20, 2022

Check out the Didgets tool at https://www.Didgets.com which will let you import data from CSV, Json, Json Lines; filter out any unwanted rows; and then export it out into CSV, Json, Json Lines, HTML, or XML files while splitting them up like this does.

tanin · on Oct 21, 2022

I'm thinking about adding the filter/sort feature soon-ish.

Just curious: what kind of features (apart from filter/sort) might change your mind about paying for this app?

smitty1e · on Oct 18, 2022

I got smacked with this just last week.

My answer was just import into MS Access.

Note: if you're pulling in a lot of S3 list_objects_v2() data, and have some honking big object sizes, the Long Integer type craps out at representing a 2Gb file. You need to use Double.

kristianp · on Oct 18, 2022

It's interesting how people use and abuse Excel. How many people try to use Excel to process millions of rows? Time to use tools that can directly query csv files to aggregate the data.

tanin · on Oct 18, 2022

I happen to build that kind of tool as well: https://superintendent.app

Anyway, if you don't know SQL, you are kinda out of luck and stuck with Excel.

prepend · on Oct 18, 2022

Or one of the billion tools designed to work with big csvs.

ergocoder · on Oct 18, 2022

Could you list the top 1000 tools? I would like to explore.

_boffin_ · on Oct 18, 2022

The answer is actually PowerPivot. Access you can do a few GB, but with PowerPivot, you can do a billion or more rows. Don’t expect any insane performance though.

NibLer · on Oct 18, 2022

Do people really have CSV files with more than 1M rows? How do you search and update such files? Why not use some database?

prepend · on Oct 18, 2022

When downloading open data files from government sites there can be millions of rows.

It would be nice if excel worked with 50M rows to do quick filters and pivots and stuff.

But of course there are other tools for that.

Comically, I’m not sure how splitting a 50M line csv into 50 files helps as you can’t filter or pivot across 50 files unless you want to do lots of manual stuff.

popcalc · on Oct 18, 2022

The only use case I can see is script kiddies toying with credential stuffing using Open Bullet 2 and well-known compromised accounts.

hurricaneditka · on Oct 20, 2022

so this will be shared again with no updates in a few months? https://news.ycombinator.com/from?site=superintendent.app

tanin · on Oct 21, 2022

What do you mean by no update? This is a new app completely.

I submitted Superintendent.app 2 times and stopped because the second time hit the front page. The other 2 times were posted by someone else, which I have no control over...