More

qwhelan · on Nov 4, 2021

Also worth mentioning that "unapproved software" on bank infrastructure is what an aggressive prosecutor would call "felony bank hacking".

AmericanBlarney · on Nov 5, 2021

Almost definitely false. Provided you weren't doing anything intentionally malicious with it, the risk would be that regulators might fine the bank for inadequate controls. As such, the bank might fire you for doing something that could lead to such a situation, but I don't see a criminal charge. There was actually quite a decent bit of "unapproved" software in use at one of the banks I worked in - mostly stuff that was in the process for approval, but that could take forever, so it was reasonably common for teams to run through the checks themselves (security scan, license review, etc) and move forward while the official review confirmed no issues.

qwhelan · on Nov 5, 2021

Well the login message I was greeted with on every ssh connection certainly threatened criminal prosecution for unapproved software at the extremely large bank I worked at.

Unlikely? Sure. But a lawyer somewhere thought it was worth reminding me 10x/day, so going to assume it's possible provided your unauthorized software caused a serious monetary loss.

qwhelan · on July 29, 2021

Also political activists who know how the lottery system works far better than the rest of the population: https://apnews.com/article/cad8b58711d54b178660a0ce379288a4

qwhelan · on Nov 17, 2020

The opening words of ACM95a my year are seared into my mind:

"I would like to apologize to the students who took this course last year. I always aim for a mean exam score of 50 and a standard deviation of 10. Last year's mean was 29 and I will attempt to not repeat the mistake"

WalterBright · on Nov 17, 2020

Haha. I remember AMA95's first lecture where Prof Cohen said "the course catalog describes this class as introductory. It is, but make no mistake, it is not elementary."

AMA95 had a reputation for being a trial by fire. Doing well in it meant you were going to graduate.

Tough as he was, I liked Prof Cohen. He was a no nonsense kind of guy, and obviously enjoyed his subject. I was sorry to read he passed on recently.

tzs · on Nov 17, 2020

At least I didn't hear about people being haunted by 95 afterwards. For us math majors, who instead of 95 had to take Ma108, it tended to haunt us.

For years or even decades afterwards the number 108 would show up. Call a busy tech support line and get put in queue, and you are told "The average wait time right now is 108 minutes". Check after lunch to see if it is time to go back to work...it's 1:08. That number would just show up way more often then it should have.

I know one person who fought back. She took the intro to digital electronics class and the intro digital electronics lab class in her senior year, instead of in her freshman year like most people did, which means she took them after 108.

For her electronics lab project she built herself a digital alarm clock (the traditional project). But her clock was special. It skipped 1:08, instead holding 1:07 for two minutes than going to 1:09.

WalterBright · on Nov 17, 2020

> It skipped 1:08, instead holding 1:07 for two minutes than going to 1:09.

Haha, love that story. Reminds me of another student who was going to build a digital clock that only displayed the time, very accurately, in 15 minute intervals. Because, he reasoned, nobody should need time more accurate than that!

I, too, built a CMOS digital clock for a freshman lab project out of about 40 chips. It did not work, it just blinked the display LEDs erratically. I still have it, it still does not work and I still don't know why.

My senior EE91 lab project (single board computer) did work, though, but I misplaced it somewhere in the last 40 years :-(

blacksmythe · on Nov 17, 2020

AMA95 is one of the few Caltech classes that I still think about, mostly because it was not useful. Almost all of my other classes have been useful over the years.

I wish that the math sequence for non-math majors had included linear algebra instead.

qwhelan · on Nov 17, 2020

One example of an adversarial university environment is how fraternities and sororities keep copies of exams and assignments from prior years. Professors know cheating is rampant, so have to change the questions every semester.

Some courses at Caltech had almost identical exams for at least a decade when I went through. The professors knew cheating like the above simply would not be tolerated by undergrads.

I sat on and helped run the Board of Control, which handled academic honor code violations, for several years and professors who had been at other universities would absolutely rave about how much more they could trust Caltech students. And that was while reporting a suspected cheating case to me.

barry-cotter · on Nov 17, 2020

This honestly sounds ridiculous to me. Every course I ever did we had access to years and years of past papers. Doing past papers is one of the best ways to study. Teaching or learning to the test is a good thing if the test is good.

qwhelan · on Nov 18, 2020

Some professors would provide past exams for study aids or use them as homework problems.

However, the explicit default was that you should not look at solutions from prior years. Professors would announce at the beginning of the course that they reuse questions and looking at prior years solutions was an honor code violation. I think it's pretty clear it's cheating when the expectations are clearly outlined.

If you had inadvertently come across the problem before and independently solved it, you were expected to disclose that as part of your answer. I personally had to do this several times, and never suffered any negative consequences for it, but the expectation for honesty was there.

shalmanese · on Nov 17, 2020

So why not just provide students with officially sanctioned practice papers that are not past papers and also guaranteed to not share questions with the actual exam?

barry-cotter · on Nov 17, 2020

If you write a new exam paper every year that is also guaranteed not to share questions with the actual exam. From a professor’s point of view many questions that are superficially different are similar enough to practically be the same question. There are many, many questions that ask about what the real cause of the French Revolution was, or test to see if you will recognize that this problem is basically a red black tree.

watwut · on Nov 17, 2020

In college where I was at last year exams were public knowledge and anyone had access to them. No one seen it as cheating to try last year exam before going on this year.

Yes, it means teachers have to vary tests, but then again it gives you repository of exercises to learn from and to train on. It is just win for learning.

qwhelan · on Nov 18, 2020

This is an explicit rule introduced at the beginning of most Caltech courses, so the norm is that that behavior is cheating due to being warned in advance.

qwhelan · on Aug 27, 2020

As yet another housemate of Andy's at Caltech, it is perhaps worth mentioning that he was also involved in updating Caltech's CS curriculum prior to it becoming as popular as it is today.

Hard to say how much of the trend is just what happened everywhere, but CS1 being in Python rather than Scheme probably helped a bit.

grugagag · on Aug 27, 2020

Was swapping Scheme for Python a good thing though? Scheme is a very good way to understand the concepts of CS in a more fundamental way. Python can be learned independently anyway through SICP.

qwhelan · on Aug 27, 2020

IIRC, CS1 was a mandatory class for other majors (like MechE) and a course in Scheme was basically hazing from their perspective.

I forget if Scheme was moved to a later course or made optional.

andymatuschak · on Aug 28, 2020

It was moved to a later course, thankfully. I’m a big SICP fan!

(Hi Chris!)

qwhelan · on May 16, 2020

We've come full circle - Amazon has a bookstore chain these days (Amazon Books).

qwhelan · on May 3, 2020

Happened a month ago: https://www.reuters.com/article/us-health-coronavirus-wells-...

dwighttk · on May 4, 2020

Is this just Wells Fargo hitting a capitalization max or more wide spread across the industry

qwhelan · on May 4, 2020

Definitely a bit of the former, but JPM and a few other big players have made similar moves.

qwhelan · on Jan 25, 2020

You should upgrade your version of pandas if possible - that's been fixed for a few versions now.

ianbutler · on Jan 25, 2020

Oh interesting I'll have to check, we did an upgrade pass a while ago maybe we just didn't upgrade pandas for whatever reason.

qwhelan · on Jan 25, 2020

As mentioned elsewhere in this thread, it's opt-in to avoid breaking existing behavior. But given that ingestion points are easy to identify, it's pretty straightforward to turn on (especially if you have a schema for your inputs): https://pandas.pydata.org/pandas-docs/stable/user_guide/inte...

longemen3000 · on Jan 25, 2020

I saw in implementation (CSV parser in Julia) were the sentinel value was randomly assigned at read time (if a value in the input was equal to the sentinel value, change randomly).after parsing, the sentinel value would be converted to the appropriate data type (Julia Missing)

ianbutler · on Jan 25, 2020

That makes sense and thanks for the info and the link. It will be very useful going forward.

qwhelan · on Jan 25, 2020

>Want to join two dataframes together like you'd join two database tables? df.join(other=df2, on='some_column') does the wrong thing, silently, what you really wanted was df.merge(right=df2, on='some_column')

Simply a matter of default type of join - join defaults to left while merge defaults to inner. They use the exact same internal join logic.

>What if they're optional? pd.DataFrame({'foo': [1,2,3,None]}) will silently change your integers to floating point values.

This was a long standing issue but is no longer true.

>Want to check if a dataframe is empty? Unlike lists or dicts, trying to turn a dataframe into a truth value will throw ValueError.

Those are 1D types where that's simple to reason about. It's not as straightforward in higher dimensions (what's the truth value of a (0, N) array?), which is why .empty exists

jfim · on Jan 25, 2020

> Simply a matter of default type of join - join defaults to left while merge defaults to inner.

No, join does an index merge. For example, if you try to join with string keys, it'll throw an error (because strings and numeric indexes aren't compatible).

  left = pd.DataFrame({"abcd": ["a", "b", "c", "d"], "something": [1,2,3,4]})
  right = pd.DataFrame({"abcd": ["d", "c", "a", "b"], "something_else": [4,3,1,2]})
  left.join(other=right, on="abcd")
  
  ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

If you try to join with numeric keys:

  left = pd.DataFrame({"abcd": ["a", "b", "c", "d"], "something": [10,20,30,40]})
  right = pd.DataFrame({"abcd": ["d", "c", "a", "b"], "something": [40,30,10,20]})
  
  left.join(other=right, on="something", rsuffix="_r")
  
    abcd  something abcd_r  something_r
  0    a         10    NaN          NaN
  1    b         20    NaN          NaN
  2    c         30    NaN          NaN
  3    d         40    NaN          NaN

Or even worse if your numeric values are within the range for indexes, which kind of looks right if you're not paying attention:

  left = pd.DataFrame({"abcd": ["a", "b", "c", "d"], "something": [1,2,3,4]})
  right = pd.DataFrame({"abcd": ["d", "c", "a", "b"], "something": [4,3,1,2]})
  left.join(other=right, on="something", rsuffix="_r")
  
    abcd  something abcd_r  something_r
  0    a          1      c          3.0
  1    b          2      a          1.0
  2    c          3      b          2.0
  3    d          4    NaN          NaN

Whereas merge does what one would expect:

  left.merge(right=right, on="something", suffixes=['', '_r'])
  
    abcd  something abcd_r
  0    a         10      a
  1    b         20      b
  2    c         30      c
  3    d         40      d

>> What if they're optional? pd.DataFrame({'foo': [1,2,3,None]}) will silently change your integers to floating point values.

> This was a long standing issue but is no longer true.

Occurs in pandas 0.25.1 (and the release notes for 0.25.2 and 0.25.3 don't mention such a change), so that would likely be still the case in the latest stable release.

  pd.DataFrame({"foo": [1,2,3,4,None,9223372036854775807]})
  
              foo
  0  1.000000e+00
  1  2.000000e+00
  2  3.000000e+00
  3  4.000000e+00
  4           NaN
  5  9.223372e+18

It's also a lossy conversion if the integer values are large enough:

  df = pd.DataFrame({"foo": [1,2,3,4,None,9223372036854775807,9223372036854775806]})
  
              foo
  0  1.000000e+00
  1  2.000000e+00
  2  3.000000e+00
  3  4.000000e+00
  4           NaN
  5  9.223372e+18
  6  9.223372e+18
  
  df["foo"].unique()
  
  array([1.00000000e+00, 2.00000000e+00, 3.00000000e+00, 4.00000000e+00, nan, 9.22337204e+18])

>> Want to check if a dataframe is empty? Unlike lists or dicts, trying to turn a dataframe into a truth value will throw ValueError.

> Those are 1D types where that's simple to reason about. It's not as straightforward in higher dimensions (what's the truth value of a (0, N) array?), which is why .empty exists

It's not very pythonic, though. A definition of "all dimensions greater than 0" would've been much less surprising.

qwhelan · on Jan 25, 2020

> Occurs in pandas 0.25.1 (and the release notes for 0.25.2 and 0.25.3 don't mention such a change), so that would likely be still the case in the latest stable release.

It was released in 0.24.0: https://pandas.pydata.org/pandas-docs/stable/user_guide/inte...

For example:

    pd.DataFrame({"foo": [1,2,3,4,None]}, dtype=pd.Int64Dtype())

        foo
    0     1
    1     2
    2     3
    3     4
    4  <NA>

    pd.DataFrame({"foo": [1,2,3,4,None,9223372036854775807,9223372036854775806]}, dtype=pd.Int64Dtype())

                       foo
    0                    1
    1                    2
    2                    3
    3                    4
    4                 <NA>
    5  9223372036854775807
    6  9223372036854775806

jfim · on Jan 25, 2020

Sure, if you specify the type. It's still a gotcha because the default behavior is to upcast to floating point unless the type is defined for every integer column of every data frame, which isn't very pythonic.

The example with the (incorrect) join above shows how even other operations can cause this type conversion.

qwhelan · on Jan 25, 2020

Yes, there's a lot of existing code written assuming the old behavior. But most code has only a few ingestion points, so it's pretty simple to turn on.

qwhelan · on Jan 14, 2020

At hedge funds, the majority of your "cash bonus" can take the form of deferred comp that's locked up in a shitty fund that doesn't perform anything like the famous ones (both Citadel and RenTec do this). Some don't let you pull it out unless you never work again.

RSU comp can be quite competitive when you take those factors into account.

notfromhere · on Jan 14, 2020

Why dont the employee funds just mirror the trades of the main fund?

throwawaymath · on Jan 14, 2020

Usually because the strategies which have genuine alpha and consistently outperform are capacity constrained, and so cannot be scaled to handle both.

Of course, at RenTech in particular the employee-only fund is the good fund. But that's not always the case, so the parent commenter has a point. Deferred/locked up compensation can really suck.

qwhelan · on Jan 14, 2020

>Of course, at RenTech in particular the employee-only fund is the good fund

Yeah, but only the long tenured/high performing employees get access to the good fund (there's a merely average fund that most employee deferred comp goes into, if I understand correctly).

Former colleague of mine was a M&A trader at Lehman during the crash. 95% of his net worth was in his fund, which was up 50%+ for the year when the bankruptcy trustee seized everything. IIRC, he was starting to get his money back in ~2014-15.