Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Learning from Code vs. Plagiarism
22 points by cogs on March 26, 2019 | hide | past | favorite | 17 comments
Pop is a new code search tool for Mac aimed at students and other coders who are learning a new programming language or API.

In the marketing we're trying to walk a narrow line between encouraging people to look at existing code and learn from it, without encouraging plagiarism. (http://etia.co.uk/pop/about/)

Say I want to calculate the interval in seconds between two timestamps, and I'm getting an error on my line of python code:

    start_time = datetime.now()
If I search for uses of datetime in some open source code and find this:

    start_time = datetime.datetime.now()
I might learn that the now method is in a subpackage of datetime, itself called datetime. This fixes my problem, and I think that's ok because I've learned where the now() function is in the package namespace, rather than copied anything.

On the other hand if I see code like this

    (datetime.datetime.now()-start_time).total_seconds()
I've learned that I can subtract results of datetime's now method and that whatever kind of object I get back has a total_seconds method. But I'm a bit less comfortable. Any code I write is reusing the algorithm even if my code ends up with different variable names and splits this logic over additional lines. On the other hand algorithms aren't patentable, so maybe it's ok to use the algorithm.

We don't seem to talk much about this. I'm convinced we can learn a lot from looking at other peoples' code; architects don't avoid looking at other architects buildings, and authors don't avoid reading other authors books. It's surely a good way to learn best practice.

Do HN'ers look at other people's code? Where do you draw the line?




I think you confuse software development with I don't know what.

All the time we look at other peoples code. See Stackoverflow. If it is only a couple of lines that is also not plagiarism, that is code reuse which in software is encouraged (see DRY principle). If you can take some code (as is) and just change some params that is just using a library or a framework. And libs are usually free to use if they are publicly available. If you just copy the library and say its your work that is plagiarism.

In Software we don't just look at code we take snippets and reuse them (ALL THE TIME), like an architect snipping a windows design out of someone else's plans and reusing them. A couple of lines of well written clean code should be reused. Don't write your own interpretation. Either you have a better cleaner version or not.


That's what I used to think too.

But some (admittedly junior) developers I've spoken to said that there was so much concern about IP theft they weren't allowed to look at other peoples' code (I assume Stack Overflow was an exception).

And this link https://academia.stackexchange.com/questions/100081/does-two... is about someone who was accused of plagiarism because of two lines of code.


I think the standards in academia are quite different than those in the workplace, primarily because the expected outcomes are quite different.

In academia, the instructors are attempting to tease out your understanding of the underlying concepts. Cutting and pasting even two lines of code from an online source doesn't show the instructor that you understand the concept.

In development, snipping well-written pieces of code is an efficient use of time, assuming the programmer is competent enough to understand the code they just snipped. We reuse common libraries for the same reason; once you solve a problem with a testable, proven solution, it's not a good use of time to have every developer redo that solution.

The main caveat, as illustrated in another comment in this thread, is the issue of licensing. Although I certainly have snipped from good stackoverflow answers, the times I build my own libraries from scratch I often crib from well-proven open source solutions with permissive licenses instead of rolling my own. It should be standard practice (and I recognize that I'm preaching to myself as much as anyone else) to reference those snips, not only for IP integrity but to provide a trail of breadcrumbs for your future self (or anyone else investigating your code) as to where that code came from and the rationale for it.


I think that is only an issue in education where you have simple homework whose best solution you can usually pick from any library. Which is why you usually do such stuff in pseudo-code rather than a real programming language.

In the real world. Guava (google java collection libs) have tons of code that is virtually identical to other libraries. The thing is very often you re-implement the same thing to not have another dependency for some small thing. Also usually you use some library or framework to implement most of the things often found in student homework. You simply should not write your own code for such basic things but use a library, that does it for you in clean and tested code.

When the task from your professor says do not use java.util, this may have educational purpose (when the aim of the game is to create your own HashMap implementation) but it is an antipattern in actual programming. Do it when you make your own library like a immutable version or like the koloboke compact version. But those derivations have a purpose you don't just make you own HashMap implementation so as not to infringe on some copyright.

Your example mainly focus on the usage of existing libraries. If you write and aws lamdba function and use the aws sdk, you often can just copy an example and just change the params, without referencing the source of your example snippet. There is no copyright issue IMO. There is often a right way and a wrong way and the right way is the cleanest, most concise way. A lambda are just maybe 20 lines of JS code and often 90% of those 20 lines is copy paste. Don't write your own original spaghetti code, you can write a more concise and cleaner version just please share it. An instructor that does not clearly teach that imo isn't a good coding instructor.


The link isn't a good example, or maybe it's the exception that proves the rule. XOR/shift is a well-known technique for hashing and I wouldn't expect an experienced engineer to need to cite it, unless they copied the specific constants.

The question author is a student and thus the expectation to cite is increased, unless they independently came up with the idea.

There are no hard-and-fast rules, but the more the example is a straightforward application of unknown (or forgotten) knowledge, the more unsurprising that knowledge is, the less I feel the need to cite. To take it back to the OP's example, once you know that subtracting two datetime objects yields a timedelta object, it becomes obvious to take the result's .total_seconds() (if that's something useful to you). Conversely, my older code is littered with citations for things that are obvious and second-nature to me now but weren't at the time.


The link provided here is an incredibly bad example to the argument. The answers to the question detail how the poster actually plagiarized. Not only in two lines of code, but in other parts as well. As long as the total line count was not 10, moss would not show plagiarism %40 and reduce it to %20 after hash function gets removed. Just read the answers to the post.


Knuth has an interesting take on this, as he learned by reading programs. He claims if the programs he first saw were too good, he wouldn't have bothered with programming: https://github.com/kragen/knuth-interview-2006/blob/master/R...

"I read the manuals that came from IBM, and it had; the manuals had example programs in there, and I thought of better ways to write those programs. I thought of, you know, well, okay, this program works, but if you did it this way, it would be even better. And so that’s given me some confidence that maybe I had a talent for programming. Now, if the manual hadn’t had these bad examples in it, I probably would not have gotten interested in programming, because I wouldn’t have this confidence, and I would have been scared and say, oh, I would never think of this.But the fact is, the manuals were pretty stupid, and that’s what gave me the confidence that I should think a little more about programming, because I might be, you know, I might be good at it."


This is an extremely grey area that honestly comes down to the patent laws of your country, the uniqueness of the code you're reusing or porting, and your intended use for that code.

This is where it becomes important to use open licenses on public projects, as it lets people know "It's okay to use this code" instead of simply putting the source out there and leaving it's intellectual ownership as an ambiguous question.

In my opinion, there is nothing wrong with using code from others' projects or works as long as you're educating yourself or, if used commercially, your use of said code is solving a different problem than the project you took it from. This is just an opinion with no real consideration given towards legal implications.


Excepting academic environments or IP issues, what you call "plagiarism" I would call code sharing, and see nothing wrong with it.


For personal projects, anything is fair game. Once you're actually distributing your program/library to users, I think attribution for any non-trivial/non-standard snippets or algorithms is a good look (just in the form of in-line comments, possibly in the header comment block). As long as the licensing works out (for example, GPLv3 snippet in GPLv3 project, or CC/MIT in a permissive open project), I see no problems there. Obviously, for closed source or otherwise restrictive software projects (especially when working for an employer), things get more complicated. While you won't likely get caught or called out for lifting a chunk of code off Stack Overflow (which I believe defaults to the MIT license), it's worth reconsidering.

But as far as looking for inspiration in other people's FOSS code? Morally that feels like very fair game. Again, attribution would be nice for an actual algorithm.


Reading code is essential to growth as a programmer. I do it all the time, often for pleasure. But reading snippets without context, like you're suggesting, is a common source of bugs in my experience.

So often when working on a legacy codebase, I'll track down a bug to some bit of code that makes no sense, and then search for that code and find it on Stack Overflow or similar. Even examples from documentation can be dangerous taken out of context; I can think of numerous times I encountered this with snippets that had been copied directly from MSDN without understanding.

Given that poorly-understood code reuse was implicated in many of our industry's greatest disasters, maybe the legality isn't what our concern should be here.


If you are going to be looking at code a lot more than you are going to be writing it. In most cases you will be looking at code in attempt to ascertain a larger pattern that is being used in a code base so you can follow suite. Looking at code to see how a library is actually used will be the least (very least) of your worries and will become super second nature.


i generally avoid looking at other peoples code unless 1) it's written for teaching purposes and aims to teach a concept I want to learn or 2) I have to because i'm having integration problem

I think looking at well-written code is a great way to learn best practices, but I think it's quite hard to do that in practice and probably not the best way.

there's a lot of garbage code out there and even if the code is good, often times you find that it's super out of date (using an old ass version of a language) or the logic is pretty obfuscated by unessential (to you) backwards-compatability / cross-platform compatability checks.

there's also a great deal of context around the intent of blocks of code that you don't have when looking at open source code in isolation in the example you gave. Some lines might seem like nonsense until you read the PR and understand the tradeoffs made at that point in time.


What if you're having trouble with an API, you don't look at code to see what kind of arguments a function needs? You just go to the docs? What if the docs are sparse?


see my point #2

> 2) I have to because i'm having integration problem

i'll prob go to the docs first and source code last. that said, my point isn't "avoid looking at source code". It's "I don't seek out source code of libraries for the purposes of learning".

if i'm trying to get stuff done, i don't give two shits how a third party tool wrote its fancy functions. that's the point of abstraction.

and most of the time, when i'm using third party tools i'm just trying to get shit done.


Generalizing, but I draw the line between

> "Oh, I didn't think of doing it that way but it makes perfect sense!"

and

> "Uh... Does this do what I want? Guess I'll just copy/paste and see if it works"

In other words, if you see something and understand what it does and how it does it, then I don't consider it plagiarism.


I believe code should be freely shared, and in general (although there must be some upper-limit), the concept of "code plagiarism" is a non-starter for me.

With that said, I would hope that in general people would not copy and paste code. Follow along, write the code, and understand how it works.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: