Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: An app to split CSV into multiple files to avoid Excel's 1M row limit (superintendent.app)
14 points by tanin on Oct 17, 2022 | hide | past | favorite | 32 comments


Why is this a yearly cost? I don’t understand.

There’s already open source utilities [0] for users who aren’t proficient in Unix commands.

If anything this should be the definition of a one time fee.

You’re free to charge whatever you like, but it seems odd that anyone would pay you year after year to use your app.

It’s not the price, as $40 isn’t that much, but the value and principle of the thing.

[0] https://github.com/philoushka/LargeFileSplitter


The target users are non-technical users. Think accountants. They are technical and not be able to use command line tools.

Other than that, I appreciate the feedback.


The target users don't have that amount of data

believe me

I worked as an accountant


It's true that most accountants won't be able to reach that kind of scale, though the modern saas businesses might be able to reach that point.


I work as a dev _for_ the accountants. We have that amount of data.


You don't understand. They said "believe me". It must mean 100% of the accountants will never ever encounter 1M rows.


First, every accountant I know has a lot of IT skills.

Second, why do accountants need to pay a yearly fee for such a basic utility?


The video (why?) doesn’t load for me, so I can’t check features, but what’s wrong with split and csplit? (https://www.gnu.org/software/coreutils/manual/html_node/spli..., https://man.openbsd.org/split.1)


Just random question: are you using iphone?


iPad


It seems iphone and ipad can't play mp4 for some reason. I may need to change it to gif instead.


My “Why?” isn’t about the video not playing, it’s about using a video as almost the sole way to figure out what the product does.


Great point. I suppose I could show some screenshots as well.


TL/DR: xsv is probably what you want, or maybe zsv and/or awk

awk can do this super easily. Here's an example snippet that not only shards, but compresses your shards. ``` (NR - 1) % shard_size == 0 { # ready to start a new shard current_n = current_n + 1 output_file = sprintf("%s%04d.%s.bz2", target_prefix, current_n, file_type) print "writing to " output_file > "/dev/stderr"

    # close any prior-opened output_command (else will err on too many open files)
    if(output_command != "")
        close(output_command)
    output_command = "bzip2 > " output_file

    # print header
    print headrow | output_command
}

NR != 1 { print $0 | output_command }

```

This of course assumes that each line is a single record, so you'll need some preprocessing if your CSV might contain embedded line-ends. For the preprocessing, you can use something like the `2tsv` or `select -e ...` command of https://github.com/liquidaty/zsv (disclaimer: I'm its author) to ensure each record is a single line.

You can also use something like `xsv split` (see https://lib.rs/crates/xsv) which frankly is probably your best option as of today (though zsv will be getting its own shard command soon)



This site probably isn't your target audience....


Unless you want competition and people to realize people would pay for such a thing


I think this is a lesson in product market fit in that there is not a market for this app and it will adjust (or go away) to find their fit.

It would be fascinating if there ended up being a market for this as it’s so counterintuitive for me.


I think $40 a year just for the split feature is a little too much. However, adding more useful features for manipulating CSV files would probably change my mind about it. For example, doing some reorder or preprocess of the files as if people split into X files they'll have to do that action X times if they do it in Excel directly.


Check out the Didgets tool at https://www.Didgets.com which will let you import data from CSV, Json, Json Lines; filter out any unwanted rows; and then export it out into CSV, Json, Json Lines, HTML, or XML files while splitting them up like this does.


I'm thinking about adding the filter/sort feature soon-ish.

Just curious: what kind of features (apart from filter/sort) might change your mind about paying for this app?


I got smacked with this just last week.

My answer was just import into MS Access.

Note: if you're pulling in a lot of S3 list_objects_v2() data, and have some honking big object sizes, the Long Integer type craps out at representing a 2Gb file. You need to use Double.


It's interesting how people use and abuse Excel. How many people try to use Excel to process millions of rows? Time to use tools that can directly query csv files to aggregate the data.


I happen to build that kind of tool as well: https://superintendent.app

Anyway, if you don't know SQL, you are kinda out of luck and stuck with Excel.


Or one of the billion tools designed to work with big csvs.


Could you list the top 1000 tools? I would like to explore.


The answer is actually PowerPivot. Access you can do a few GB, but with PowerPivot, you can do a billion or more rows. Don’t expect any insane performance though.


Do people really have CSV files with more than 1M rows? How do you search and update such files? Why not use some database?


When downloading open data files from government sites there can be millions of rows.

It would be nice if excel worked with 50M rows to do quick filters and pivots and stuff.

But of course there are other tools for that.

Comically, I’m not sure how splitting a 50M line csv into 50 files helps as you can’t filter or pivot across 50 files unless you want to do lots of manual stuff.


The only use case I can see is script kiddies toying with credential stuffing using Open Bullet 2 and well-known compromised accounts.


so this will be shared again with no updates in a few months? https://news.ycombinator.com/from?site=superintendent.app


What do you mean by no update? This is a new app completely.

I submitted Superintendent.app 2 times and stopped because the second time hit the front page. The other 2 times were posted by someone else, which I have no control over...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: