Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Fun to think about, but in the real world, no question neatly divides people, even the gender one. To quote Reddit's u/tailcalled[1], the exo-software/meatspace world is even less standardized than the software world:

Falsehoods programmers believe about gender: http://www.cscyphers.com/blog/2012/06/28/falsehoods-programm...

Falsehoods programmers believe about names: http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-b...

Falsehoods programmers believe about addresses: http://www.mjt.me.uk/posts/falsehoods-programmers-believe-ab...

Falsehoods programmers believe about time: http://infiniteundo.com/post/25326999628/falsehoods-programm...

More falsehoods programmers believe about time: http://infiniteundo.com/post/25509354022/more-falsehoods-pro...

Falsehoods programmers believe about geography: http://wiesmann.codiferes.net/wordpress/?p=15187&lang=en

[1] http://www.reddit.com/r/programming/comments/1fc147/falsehoo...



I dislike some of the examples of the "falsehoods programmers believe about time" because they typically aren't falsehoods, and will always hold true within the constraints of your system. Yes, time is a fickle thing that is marred by history, but in building a system, I choose a representation for the data I am storing in it. If I choose to store dates using a Gregorian calendar (because I'm following ISO-8601), then,

> Months have either 28, 29, 30, or 31 days.

Does hold true: months do have 28 to 31 days, by definition. If someone wants my Gregorian date in a different calendar system, then we must convert, but that a display issue. Otherwise, we're comparing apples and oranges, and of course all bets are off.

It's kind of like time: the hour "2 AM" never repeats itself on random politically appointed days, because I have chosen to store timestamps in UTC. If someone wants to see that timestamp in "PST" (or America/Los_Angeles), then that's a display issue that can be accommodated, but it does not suddenly violate the fact that "2 am" never repeats in UTC.

Of course, in real life, people don't know what timezone their date is stored in and things love to break randomly on DST switch-overs or leap days or the ends of leap years.


2am doesn't even repeat in the local timezone, it's just that the timezone changes. So here for example, 1:59:59 am CDT increments to 1:00:00 CST, which does not conflict with the earlier 1:00:00 CDT. I think it would be better not to go through all this hassle though, myself.


Nitpicking a bit, but I'd usually argue that the TZ doesn't change: You went from "America/Chicago" to "America/Chicago". The timezone encapsulates more than just the offset: it also includes when and how DST works. Merely, you went from "1:59:59 am America/Chicago DST=true" to "1:00:00 am America/Chicago DST=false". I'd say this is just terminology, since "CDT" and "CST" work just as well (and how I envision it in my head, mostly due to the official TZ name being "America/Chicago").


2 AM might not repeat in UTC, but midnight does if your internal representation uses unix timestamps (or similar - .Net doesn't count leap second ticks either)

http://en.wikipedia.org/wiki/Unix_time


If you store time in UTC, then you haven't solved all possible time storage problems - for example, you still need to take into account that some days have 25 hours; so any structures for exchange/storage of periodic data (hourly planning schedules; by-minute temperature readings) need to contain a variable number of data instead of fixed set of 24 hours. If your HR system is doing employee scheduling for 24/7 shifts, then those days may cause a lot of issues no matter how you represent that data.

And the 'display issues' are nontrivial - it's not just converting a timestamp to a string; it has tricky consequences for UI layout and printing if those night hours matter.


> some days have 25 hours

I'm curious to know in what circumstance you think a UTC day has 25 hours.

> it has tricky consequences for UI layout and printing if those night hours matter.

This is very true. I don't believe Google Calendar (or any calendar app that I've seen, really) handles displaying DST weirdness well. It would be very tricky to render. I'd love to see something attempt to tackle this.


You can (and likely should) choose to store your locale-specific times in UTC, but you often can't choose to define 'days' as UTC days.

If your days are meaningful for your app in any way, then you have to have clear boundaries between days/weeks/months - if your website has a report 'downloads per day', then it doesn't mean UTC days (which for many locations would mean splitting in the middle of business hours. And it has days where the difference between 'start-of-day' and 'end-of-day' is not 24 hours, but 25 hours.

Also, no matter how you handle time storage, if you're doing any analytics, and your process is minute-dependent instead of day-dependent (say, power consumption, not purchases), then your daily totals will have ~5% jumps twice a year that you might need to adjust.


It is explicitly not a display issue.


I'd suggest, "Do you self-identify as female?" There are slightly more women than men, so your bifercation is "female" and male/intersex/genderqueer. I'd suggest that currently intersex and genderqueer people are currently a small enough population that it would be close to 51/49 other to self-identified woman ratio.


The question should be about biology, not stuff like gender. Biology solves this problem (most of the time)


Not really. The stats have not been thoroughly calculated (to my knowledge) but here is an attempt: http://www.isna.org/faq/frequency

Not XX and not XY: one in 1,666 births

Klinefelter XXY: one in 1,000 births

There are other relevant cases, not necessarily measured here, like mosaicisms and chimerism.

"Do you self-identify as female?" is a more answerable question.


but doesn't 1:1666 mean the success rate is more than 99%?

Asking about the posession of an XX chromosome should separate the people almost 50:50, the rest is something else.

On the other hand, maybe there are just <1% people who don't identify them self with being male or female, so the questions would make no difference...


| On the other hand, maybe there are just <1% people who don't identify them self with being male or female.

Huzzah for logic. But note that the question was, "Do you have 1 apple" and not, "Choose a statement: 'I have 1 apple', 'I have 2 apples'"


True story, but "Do you have XX-Chromosomes?" would do the trick and cut the mass of people at around 50%.


All that matters is that the answer stays consistent for purposes of the 33 questions.


> even the gender one

Sure, but if you ask "do you consider yourself classically male" and "do you consider yourself classically female", you'll get the vast majority of people, so you can still eliminate large swaths of population with either of these.


The interesting nature of the problem is that you can't just 'eliminate swathes', your question must evenly divide the entire population.


I don't think that that's a requirement at all, unless you think that his second example question of a list of countries perfectly divides the population with not even a rounding error


If you don't evenly divide the remaining population of the earlier questions, you'll need more than 33 questions.


For the purposes of this problem (and most other purposes) if you have a question that divides answers in 47%:A 51%:B 2%:'stupid_question_doesn't_fit_me_I'll_answer_randomly', then it's still perfectly okay.


No, answering randomly is not okay. You will not be able to reproduce the bit string if there is a random answer any more than two source texts with a single bit changed between them will produce the same hash.


Yes, there's so much about these.

Of course, in practice, you usually target your system to a narrow set of users at first.

But yeah, if you're facebook, or work with an airline booking system, for example, you will most likely hit every single item on these lists


> you usually target your system to a narrow set of users at first [but eventually the exceptions show up]

Reminded me of the College Humor sketch about security questions: http://www.collegehumor.com/embed/6936880/security-questions...


"but eventually the exceptions show up"

Yes. And they will be pissed/disappointed, especially if you're preventing them of doing something (their job, for example).

Great examples on the video! What's "impossible" today is obvious tomorrow.


"not until I saw the newsweek article and I thought 'thats me!'"


That's how I feel when sites expect me to have a phone number. I've moved on to use mode modern replacements of that communication technology (hint: it uses the internet and doesn't care where in the world you are).


>... 33 'Yes' or 'No' general questions that, when answered correctly, uniquely identified everyone on the planet.

I think a lot is possible with this challenge. You could compress over 8 billion yes/no questions in a single yes/no question under these rules.

And if that doesn't neatly divide people, why not let the people divide themselves through unique thought?

A 33-bit hash would probably collide too much. Yet there seems no requirement to communicate your hash back to the creator of the questionnaire with your answers. It could be a 1024-bit hash of a short story like:

  Hi I am blauwbilgorgel. I currently define myself as male.
  My internet names are ... I lived at ... I think we are
  in Time Cube. Today is Setting Orange. I declare ...
that would create a unique hash with which to uniquely identify yourself with.


But, then again, so would "I am John Doe and live in 23 Maple Road, Kentucky, US, 12345, Apartment 1a", that's why the post office uses it.


You can't identify everyone in the world that way though. There are people with the exact same name living in the same house, and there are people who don't have a postal address.


Well gender is good enough already: "Are you female?" does not imply everyone saying no is male and is close enough to 50% that it not a big deal (2^33 leaves you with some 1.5 billion people leeway).


Not only does a "one-sided" question like this get you "close enough", the worldwide male/female gender disparity (around 60 million more males than females in 2010) is almost certainly larger than the worldwide intersex/trans/etc population.

A question like, "Are you a male or a Mexican female?" gets you even closer, though.


Interesting lists.

I was initially confused about the first one. Then I realized that the author mentioned working on HR systems and it clicked. But for most of us who aren't building HR databases, I honestly think most professional programmers don't have to think about gender in their work nearly as much as this list suggests - the biggest thing I could come up with would be for localization, where some strings may need to be tweaked for the gender of people or inanimate objects.


Mirror for the gender post: http://pastebin.com/raw.php?i=25bnhuBC




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: