The questions which are really poor in quality end up being deleted. We did some analysis of deleted questions and our results were published at WWW 2014 [0]. The dataset of deleted questions (up until June 2013) is also publicly available [1]. We also built a machine learning system to predict questions that would be deleted in the future using various features [2].
Questions are also deleted years later when the moderators arbitrarily decide to. They refused to delete my questions when I deleted my account, which made sense because they didn't want to lose all that information. I had over 100 questions, maybe 150 questions, after all.
Tonight, however, someone deleted my question because I discussed it on HN.
I agree with you. However, the problem is that we can't consider something as a true positive (for a paper) based on gut or intuition. We have to rely on some sort of ground truth and this is the best sanitized sample we could get.
If I may highlight, though, your sample is completely flawed:
1. The close vote queue's backlog is such that most bad questions remain open and therefor don't ever get a chance to be deleted in the first place unless (as you've noted) a moderator randomly runs into them months later.
2. A problem experienced by SO is one-off accounts: users that get question banned end up coming back for more with a new account, and leave the abandonned ones unmaintained. Where a normal user may end up deleting their downvoted questions, an abandonned account's user will not.
Put another way, your sample would be a heck of a lot larger if everything that should likely get deleted would be.
> If I may highlight, though, your sample is completely flawed:
I would respectfully disagree. :)
> The close vote queue's backlog is such that most bad questions remain open and therefor don't ever get a chance to be deleted in the first place unless (as you've noted) a moderator randomly runs into them months later.
A question could be deleted even if it was not closed earlier. Closed Question is just "one way" for a question to be deleted (Refer Figure 1). In fact, only 14.38% of the questions which were deleted were marked as 'Closed' (Refer Table 6). Your intuition about "moderator random runs" was however found to be true. In case, you are suggesting that most 'closed' questions should be considered as future 'deleted' that is a call IMO would not like to make. In any case, we found that deleted questions are more poor in quality than closed questions (Refer Section 4.5 Sub section "Quality Pyramid"). In that light, it would make sense to treat them as separate entities.
TL;DR - Deleted questions are beyond repair but Closed questions could be patched up
> A problem experienced by SO is one-off accounts: users that get question banned end up coming back for more with a new account, and leave the abandonned ones unmaintained. Where a normal user may end up deleting their downvoted questions, an abandonned account's user will not.
This may be true but it does not necessarily mean there is a problem with the dataset sample. It simply means that the problem is more challenging :)
> Put another way, your sample would be a heck of a lot larger if everything that should likely get deleted would be.
Since all closed questions do not get deleted, this may not necessarily be true. But, since you suggested closed questions, we had a prequel to our deleted questions paper which was about analysis and prediction of closed questions [0, 1]. You might like that as well!
I assume you meant the code. Please feel free to use it. It is hardly "mine" because it just contains putting up 'sklearn' functions together. In case that is not appropriate, what license do you suggest?
Sadly the Stackoverflow community seems to have become a lot more officious in the last few years and the number of petty bureaucrats has skyrocketed. I asked a genuine question which I didn't know the answer to and which I was hoping someone else may have come across. It was immediately downvoted for not having a proposed solution. Eventually it was upvoted and became positive because it was a genuinely useful question. That kind of officiousness is completely unnecessary and needlessly turns people off participating in the community.
I just had a similar experience - I posted an (IMHO) well written question that was immediately downvoted with no comment. It has no answers because I suspect few people are going to look at a <0 question.
This upset me more than it should because in a vulnerable moment of need, I was digitally slapped. SO used to feel like a special gem of altruism where devs helped each other out for the love of it but now it feels like a faceless, cruel and nasty place that I want to avoid. I don't think we fully account for the emotional impact of such events when we introduce voting\karma type features.
I make an effort to upvote questions and comments/answers that have been unfairly downvoted. The same problem plagues HN for anyone who dares to defy the groupthink.
May I ask which part of the community, which tags? I hear a lot of these complaints here on HN about the unfriendly/bureaucratic/aggressive nature of SO but don't seem to encounter it all that much while using it. Which could mean either I'm not botered with it enough, or I really don't see it, or it's not there, for the tags I'm mostly active in (c++/labview/matlab/msbuild/c#). Is it possible there are a couple of subcommunities on HN, some (I'd guess web-related or so since that is a field I never looked into) more violent than the other.
I'm a hobby dev of 12+ years. I've yet to find a polite IRC channel or discussion forum that offers genuine help. The online community of hackers is MUCH unlike the real-world community of hackers that you'd meet at a conference etc, in my experience. =[
I've been surprised by how helpful some of the devs on the mozilla #servo channel are to noobies NOT like other IRC channels I could name. In my experience smaller communities are nicer on average.
I completely agree. I've asked a couple of questions now that had lots of genuinely useful answers, and were then closed for some perceived rule violation.
Assuming for the sake of argument that the moderators were correct and my question was off-topic: it's really rude for half a dozen moderators to pile on and all say so. It gave the impression I was out of line / being told off and it really put me off continuing to be active on the website.
I agree that the messaging displayed for on-hold questions can seem like you're being ganged-up on. In reality, the way that questions get put on-hold (and eventually closed) is this: there's a review queue that users with enough reputation can go to, where they will be presented with a question that has been flagged as being "off-topic" (or having some other problem). The reviewer can either agree, disagree or skip the question. There's no discussion with other moderators around the matter, and you don't even know who the other reviewers are until you cast your vote. So the "put on-hold by John Doe, Jane Doe, etc." message that ends up on your question is not intended as a rebuke; instead it's meant to force the reviewers to act responsibly, since they won't have the shield of anonymity. This is in addition to many other checks that StackOverflow puts in place to try and prevent reviewer abuse. Point being, try not to take it personally when your question is put on hold or closed; nobody's trying to tell you off (they would do that in the comments if anywhere).
serverfault.com is the worst community in that regard. Asking a technical interesting question for an uncommon usage scenario always provokes grumpy and short comments complemented by downvotes. I'll always include a detailed explanation yet nobody seems to care. Questions are closed as duplicate despite pointing out that this use-case is unique.. I don't bother using the site anymore.
That must have been an exception to the rule though. If something is down-voted it has a significant higher chance of getting more down-votes and vice-versa.
You might think "doh". But if you get down-voted, try deleting the post and post the exact same content twenty minutes later. You will be surprised.
I was expecting horrible questions, basically spam. Nope, these are real and legitimate questions and I don't see why most of these were downvoted. I wonder what the data looks like beyond just some of the worst questions, and whether the trend is going to be more and more "legitimate questions that are either 'obvious' or already answered get downvoted." If so, that cannot be a good thing.
The actual spam and completely incoherent questions are deleted. So what you're left with is things that are actually legitimate questions, so don't merit admins intervening and deleting, but which touch a nerve in some other way; like people asking about how to send out large volumes of (presumably) unsolicited email, people asking homework questions, and the like.
Also, note that some of these "most downvoted questions" are also among the most upvoted as well, having very large positive scores, like https://stackoverflow.com/questions/1642028/what-is-the-name.... Once a question becomes famous enough, there are likely to be a number of contrarians who don't like it and downvote it, and eventually that can add up.
And then if you look in the comments of some of them, that seems to be not too bad but which have more downvotes than you'd expect, they got a downvote brigade because they complained about a few downvotes on meta. Pretty lame that people do that, but it happens.
These are only questions that for some reason received a lot of visibility and as a consequence were seen by more people who thought they weren't interesting. They were also seen by more people who thought they were great questions and upvoted them.
The only conclusion I draw from this database query is that “most downvoted” is not an interesting metric.
You are right that these are old. The only way for a question to be in the top ranks for visibility is to have had a long time to be visible over. This doesn't mean that duplicate questions do not get asked, answered and upvoted. Too much so in my opinion.
I see lots of complaining over SO moderation in this thread and other places, but the actual examples shown always turn out to very sensible moderation. For example your question is clearly not appropriate for SO. You are not asking a specific question, you are just asking people to write code for you. That somebody was nice enough to actually write your code for you, does not mean it was a good question!
Now there might actually be an audience for a "write my code" website, where users could post programming tasks or homework exercises, and other developers then might try to solve them for the fun and challenge of it. But SO clearly does not want to be that kind of website.
Actual examples always turn out to have very sensible ex post facto rationalizations of moderation decision. Yes, there's always a good argument - this question is too short, that question is too long, this question looks too much like homework assignment, that question looks not enough like homework assignment, etc.
The point is, when you look at even few examples taken together, you realize that those "sensible moderation decision" cover pretty much the entire space of possible questions. There is no good question on SO. Moderation seems to be applied unevenly and sometimes totally at random.
This is why I don't even ask questions on SO - I have no idea how to write one to be sure I won't end up at -100 (or +100, but Closed As Not Constructive, which is SO's way of highlighting actually interesting questions).
Well there are thousands upon thousands of questions on SO which are not closed, so some people apparently still manage to ask in such a way that the questions are not closed. But of course it could be totally random as you suggest. But can you provide an example of a quality question that was in line with the SO guidelines, but still closed?
To put it bluntly, it would have been much more educational for you if you had actually tried to code it yourself. If you during this had encountered a concrete problem or question which you couldn't solve on your own, you could have posted it on SO.
But it's perfectly acceptable to ask how to split a string in C#? That's not covered many other places? It's a duplicate, yet highly up voted question.
I see this attitude a lot and frankly it is quite annoying. If the guy wanted a lecture he would have asked for a lecture. He wants to learn how to do something without spending 100 hours in it. I know that feeling, lots of us have been there. You are not his mama to lecture him.
Look who asked this question. "Not designed for this purpose," indeed. Joel and Jeff weren't sitting around trying to think of reasons for people NOT to use their site.
>
Now there might actually be an audience for a "write my code" website, where users could post programming tasks or homework exercises, and other developers then might try to solve them for the fun and challenge of it. But SO clearly does not want to be that kind of website.
Bountify, if you're prepared to pay for the code. Codegolf on SE if you want to explore the range of answers that meet the spec but which can't be turned in as homework.
No, this is not a good question for StackOverflow. You simply asked for someone to write code for you, which is not what StackOverflow is intended to be for. It is intended for help on a particular specific problem you are having with your own code.
It was a short well-defined question with given input and a specific output. Elisp isn't the most widely used language so coming up to speed could be time consuming. Someone else even starred it so they must have found it to be useful.
If I'd just ask to "Calculate someone's age in C#", that'd be a much better question?
It'd be interesting if they would just register something like shitpile.com, set up an instance of Stack Exchange on it, turn off the moderation game, and push closed and deleted questions there instead of disappearing them.
That would at least sort of be an experiment on whether all of the decisions they have made about what Stack Exchange sites are 'supposed to be' actually matter or not.
Are you saying that you can't find a point of view where it would be a more specific question? Because that's what I meant when I said 'arguably'.
Saying that you can find a point of view where it isn't more specific isn't really that interesting (to me) as a response to that. Hopefully the rest of my comment is enough to see that I was not particularly defending the SE status quo.
Anyway, I was just refering to the first sentence. The rest of your comment is an interesting proposal, though at this moment I don't have any thoughts on it worth sharing.
I am sure you could classify a fair number of SO questions as "asking someone to write code for you".
I have been down voted a number of times for "where to start with X (in order to do Y)" type questions - where I am in the same situation and don't have the time to read everything in the docs, to produce what is likely a few lines of code.
The SO response is along the lines that it will generate too many answers or opinions. They are usually closed without receiving any answer whatsoever. I have pointed this out to moderators, but they just post back links to meta pages.
And this exactly is the problem with SO. There is no good question for that site, the classification seems arbitrary and totally random. I don't participate in this site for a very simple reason - I have no idea how to ask a question that will be OK by SO rules.
(the fact that you can't be in any way contributing to the site until you asked enough questions to bump up your karma is another blocker)
It is not that difficult. I have asked many question and never had anyone close them. If you follow common sense and state the question clearly and specifically, you should be fine.
I sometimes vote to close questions, but it is always due to pretty obvious laziness like not actually asking a specific question ("Here is my code... please help me") or overly broad questions ("I'm trying to write an e-commerce web site. Can anybody point me in the right direction?")
Echoing other sentiments, this is not a good question at all. You haven't shown code that you've tried, you're not asking a specific question about the problem you're having writing such code, and (worst) you're basically asking someone to write the code for you. This is a code request, NOT a question about coding.
Apparently. That's a pretty sick move. In other words, you can't talk even about stackoverflow on another forum without stackoverflow moderators trying to take control of the discussion.
Whoever did this on SO should have their moderator privileges removed pronto, that's a very bad piece of advertising for SO and hurts its image much more than the rest of this discussion and the original article combined.
Yes, that's how they roll on StackOverflow. Hope they fixed the bug so the guy who answered the question didn't lose his points. He was upvoted 13 times.
That wasn't a question at all. It was letting people know you wanted them to write code for you.
If your "question" starts out, "I'd like to create...", and finishes with the expectation that else somebody will write the code for you, it's a pretty good bet you're going to get down voted.
Is there any kind of Stack facility for complaining about the quality of moderation? Because the moderation here is so clearly wrong that I'm not sure the people who made that call should have posting privileges on the site, let alone moderation.
Actually, that's exactly the kind of question that StackOverflow wants to discourage; people just using it as requests for someone else to write their code for them.
First: even if it's true that SO wants to discourage people from asking questions where they're starting from scratch and haven't written any code themselves yet... it's still a terrible moderation, because the explanation doesn't communicate anything close to expressing that. It says the question is unclear. That's flatly false.
Second: is this really the kind of question SO doesn't want? Their privilege, if so, but in addition to being well-defined it's clearly content that more than a handful of people were interested in (at least 15 upvotes on the most popular answer, multiply it by the ratio of non-user visitors to user visitors). It's a programming question, it has a programming answer, and it certainly has instructional value as an elisp recipe beyond whatever intent the author had.
And as for whether it's lazy -- that's basically the charge here, right? -- I'm not sure I see a functional difference between someone asking a question like this and someone who has written a larger pile of code and is missing a narrow piece of knowledge to complete it. There's possibly arguably a contextual/rhetorical difference where in the latter situation we see the context which led to the question being asked and in the former we don't... but honestly, nobody who's fundamentally asking other people to just do their work for them is asking about text->html transformations in elisp, it's pretty easy to read between the lines and see that.
To underscore the point, let's just compare it to some other questions:
It's pretty easy to see what most of these much more popular questions have in common with the one we're talking about here -- as part of whatever these askers are doing, they know there's a capability embedded in a tool that's part of the environment. They just don't know how to bring it out. You could, of course, just tell them to go RTFM for awk, until they get something simple like deleting a line from a file when it matches a string you know, but then again, you could also tell people to go more carefully read the relevant documentation for most programming problems, and if you accept the question and the answer you've got relevant content people are looking for on the site and a recipe people can start learning more from.
The moderation works based on votes one one of a few pre-canned reasons from several different users. If enough users vote to close, the pre-canned response that the most voted for is what is listed. That means that it doesn't always exactly match the reason that the question was really closed, but it means that it's possible to give an approximate rationale for the close that was performed by several people, not a single person.
I have answered a lot of questions on StackOverflow. One of the worst type of questions is the "please, write this code for me" question. There are frequently lazy or incompetent programmers who just ask for code to do their job, along with students who are asking people to help them cheat on their homework. So many people on SO are very quick to downvote and close any such questions.
Now, there's a grey area. Asking how to do one specific task which is a small component of a larger program is different enough than asking someone to deliver a completed program to you that it's tolerated. Asking something like "how do I use this one facility in a language" is different than asking "please write this tool for me."
There's also just the fickleness of the crowd. Any site which is based on voting, like StackOverflow, Reddit, HN, and so on, can have some questions upvoted and some downvoted due to sheer bad luck, different sets of people looking at different sets of questions, and so on. Trying to expect that a large group will act in a coherent manner, when only a small subset of that group ever even sees any given question (and with strong bias, as different people follow different tags), is just going to set you up for disappointment.
> nobody who's fundamentally asking other people to just do their work for them is asking about text->html transformations in elisp, it's pretty easy to read between the lines and see that.
It's not "please write this code for me" question. Rather, it's "please, grep your ~/.emacs.d and paste defun that you wrote (or borrowed from someone) years ago". Elisp code lifecycle is different than usual code and it's reusability is much higher than other kinds of code. I have bits of Elisp in my config which were written in the '80ies, obtained very much in the same manner: by asking for it on mailing lists. The code is there, its discoverability sucks, which is why questions such as this are legitimate in case of Emacs, even if it would be different in case of other languages adn environments.
Not taking into account specifics of a particular domain - along with questions being judged by people who have no idea what is being asked, but feel the need to judge - was the very thing which made me leave SO.
You are clearly not a frequent user of Stack Exchange because you do not know about the meta sites which are available to discuss moderation decisions. And if you are not a frequent user who are you to say that "the moderation here is so clearly wrong that I'm not sure the people who made that call should have posting privileges on the site, let alone moderation"?
That sounds exactly like my experience. Challenge anything on moderators, and you will be down voted to hell. Its like normal site down voting on steroids. "Do not question the mods"
It was removed from Meta. If there are archives, perhaps my user id can link back? Are the deleted messages purged? My user id from the message: user568866
I liked this idea, but noticed the gap between up/down votes was actually quite large, meaning the questions weren't actually that controversial. I went ahead and made a change that filtered out results where ABS(up - down) is less than half of MAX(up, down). (its quite a bit messier than that, though, as there's no MAX function in sql that selects the greater of two operands).
I think I am definitely going to start using the '-->' operator, it is awesome! If I mix in with a few 'Yoda conditionals' then the world is really going to love my code.
The question about the single-layer neural networks being closed really bugs me. The entire reason Stack Overflow is so great is that that are experts there, that can answer my stupidly specific questions.
If I wanted generic answers to a problem, I can Google them and find some half-assed forum post from 5 years ago.
The mod explanation for the Neural Net question is revealing:
"This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet"
That's a canned response. When users vote to close a question they have to pick from a list. When a question is closed the most popular reason for closing is automatically shown.
I guess it tells us that people will vote to close even if a question doesn't fit into any of the close reasons, and that a previously "too broad" question would be closed as "too narrow" if you tighten it up.
I can't stand Stack Overflow. Quite obviously it is dominated by people who would much rather show you who's the boss by competing, not for who can provide you with the most useful answer, but by who can be first to find a reason why your question doesn't deserve an answer (and then to block anyone willing to answer from being able to do so).
Yes, of course, I understand the assertion: that by attacking questioners instead of helping them, the alpha dogs are laboring selflessly to defend the quality of Stack Overflow. You should thank them for biting you.
What nonsense. Imagine if Google took the SO approach to quality and prioritized making you jump through hoops with your question over trying to find the best answer it can: "your phrasing is ambiguous; if you think otherwise, you can try rephrasing your query properly and try again", or "you have misspelled one of the search terms; you should figure out which one and look up the correct spelling before submitting a search; this is, you'll agree, in everyone's best interest". Or, "not a constructive search query; might lead to pages that include opinions". Or how about, "some of the words in your query resemble those in another query made several years ago; here are their results; if their results aren't helpful to you, you should have asked a better question."
Of course SO, unlike Google, stores the questions, but SO could still learn from Google's example. Google is so useful, because instead of improving quality by strictly limiting what it will index and what you can ask ("no pages expressing opinions", "index no new pages discussing any aspect of a topic if some aspect of that topic was discussed on another page already in the index", "the answer to your query has been found but will not be displayed, because it was not of interest to our moderators"), it includes as much real (not fake) content as it can and uses technology to lead any question to the most relevant answer it can.
How ironic that when you use Google to find a programming answer, it so often judges some Stack Overflow entry most relevant. Then, if you follow the link, you find that someone was trying to provide exactly the answer you were looking for, but the mods spotted him, ruled that what Google had correctly deemed most relevant to be "Not Constructive!" and had forbidden anyone else from answering.
I'd love to see an alternative to SO that optimized for answering questions instead of controlling them. One that allowed almost any programming question, allowed anyone who wanted to answer to freely answer, that collected reams of overlapping explanations and answers to multiple variations of programming questions, and that used state of the art technology to lead questioners to those answers that were likely to be most successful at answering your specific question, while leaving the question open to anyone who wanted to volunteer new answers, even if they are repeats or opinions, or provide their own best-guess links to previous answers.
I don't think it's clear at all. For someone who visits the site without participating (for example to read an answer that showed up in a Google search), they see a bunch of good answers. That looks like a high quality site. Edited to add: And if their question was asked and deleted because it was considered "bad", it never shows up in a Google search.
You can't say the mod culture is not improving the quality of the site because you haven't provided the original content without moderation to see what it looks like.
A question need visibility to get such a large amount of downvotes. They are controversial questions which gets a high number of both up and downvotes.
[0] http://www2014.kr/wp-content/uploads/2014/05/proceedings_p63...
[1] https://archive.org/details/stackoverflow_2013_05_deleted_po...
[2] https://github.com/denzilc/stackoverflow-deleted-prediction