More

physguy1123 · on Oct 16, 2018

Timber Hill did not participate in the retail business to avoid conflict of interests with IB.

For what it's worth, one can't internalize options orders in the same way one can equity orders, purchased options flow must make it to the market. I'm not familiar with the history of this decision but I suspect it's since options are less liquid and have higher spreads, so internalization would get a much worse deal than say auctions. You can rebates from certain exchanges for initiating auctions on them, and can selectively initiate auctions on exchanges which benefit you more, but options orders themselves make it to the market

jbeyer · on Oct 16, 2018

Quite regularly, I see that my orders are filled at Timber Hill, when there are other participants offering at the same price. We may be quibbling over the definition of "internalization", but that is internalization.

physguy1123 · on Oct 16, 2018

It is strictly not internalization, in fact, and as far as I know from friends working on retail at the new owner of Timber Hill IB and Timber Hill didn't ever cooperate on optimal routing strategies since Thomas Pterfry didn't want people loosing faith in IB routing. There are plenty of reasons why Timber Hill might win the trade:

  * They might be first in the queue on a price-time exchange, and so naturally get the order

  * They might be on a price-size exchange, and so get some of the fill simply by being present.

  * They probably participate in the auction process, and will naturally get some portion of orders that IB initiates

For what it's worth, Timber Hill USA was bought by Two Sigma ~1 year ago. It wasn't actually unprofitable as shown by public filings, but wasn't making much and Thomas Pterfry didn't want an illusion of conflict of interest.

jbeyer · on Oct 19, 2018

What you say about them "winning" the trade is not accurate. If it were happening on an exchange, as in each of your three examples, the trade would show as being on that exchange, and I wouldn't know who the counterparty was. If the exchange is "Timber Hill", then IB chose to route the order there, which is internalization.

physguy1123 · on Oct 19, 2018

If you're talking about the USA, this is incorrect. I actually work in the industry and know how this works

* One has to initiate an auction of the client order, at least in the USA - see https://www.sec.gov/rules/concept/34-49175.htm#P193_52817, specifically "Unlike internalization in the over-the-counter equity market, the options exchanges' rules permit a firm to trade with its own customer's order only after an auction in which other members of that market have an opportunity to participate in the trade at the proposed price or an improved price. This auction provides some assurance that the customer's order is executed at the best price any member in that market is willing to offer". In practice the placing firm will bid for the auction at that price, and non-competitive prices don't get posted since then the firm has to lose money.

* counterparty data is reported by the OCC so regardless of where the order was executed you will know who was the counterparty. It's fairly easy for IB to know whether Timber Hill executed an order, whether or not they intentionally routed it there.

Second, I actually have inside knowledge of a sort about how Timber Hill USA worked before getting purchased by Two Sigma, and they did not work with IB to take customer orders and weren't to happy when they did get customer orders that originated from IB, specifically because people then run around complaining that IB is up to spooky business internalizing with Timber Hill.

physguy1123 · on Oct 15, 2018

In this specific case, it means that retail flow is much less likely to immediately move the market than flow from an informed (knows the market will move soon) or large (will trade so much that it will move the market) player, and is much more valuable to market makers who try to capture the spread on that order (and usually improve it) and hedge/trade out of that risk relatively soon.

physguy1123 · on Oct 2, 2018

On x86, push/pop have dedicated hardware optimizations known as the stack engine which perform most of the rsp increments/decrements and passes those offsets into the decoder, instead of using executions slots on them. push/pop are also much smaller than the corresponding mov/add instructions.

It's much more optimal to use a series of push/pops for smaller operations like saving registers before a call than to manually adjust and store onto the stack.

While technically this is still incrementing/decrementing a register and storing, the amount of isa/hardware support for such things clearly demonstrates that the x86 isa and modern x86 hardware gives special treatment to the stack.

physguy1123 · on Sept 4, 2018

There are three carefully crafted posts in new where commenters mention this market, it's clearly some ad attempt.

physguy1123 · on Sept 4, 2018

What sort of benefits have you seen from something like this in real software? I tried some techniques like this before, but in the end they didn't have much of an impact.

vgatherps · on Sept 4, 2018

I've gotten mean server response times to go down by ~30% in the most extreme case, but the improvement was usually in the ~10-15% range. The real benefits came from shrinking tails that came about when some useless cache junk got rid of ALL the important data.

My use case was also preventing garbage processes from thrashing the cache, so the process using nontemporal stores was not the important process. A proof-of-concept merging of the two showed the same benefits from nontemporal stores, and the same principle applies anyways.

physguy1123 · on Aug 20, 2018

For me, it's around ~45-50fps@4k for a good set of games, and often ~100-120fps@144hz for me.

1000$ is a steep price for the 2080TI, although depending on the 2080 performance it might be the only card which can and will continue to reliably hit 60fps@4k.

Not upping ram with such a huge price increase is absolutely absurd.

ihuman · on Aug 20, 2018

> ~100-120fps@144hz

Do you mean @1440p?

physguy1123 · on Aug 5, 2018

> even though by definition people are overvaluing whatever is in the bucket

By that do you mean that etfs which people might buy are overvalued compared to the stocks which compose the etf? Or that overinvesting in the etf results in the stocks getting overvalued?

Because as somebody that works on etf market-making, no liquid etf is trading outside of a reasonable fair-value spread based on the underlying components, unless you count situations where markets for the underlyings are closed so there's no exact fair value.

physguy1123 · on Aug 5, 2018

Individual investors don't need to pick single stocks to benefit from growth. These companies also don't have a presence in various etfs and mutual funds that people can invest in, and it's not unreasonable to think that a wider selection of say small-cap growth companies might make for a better selection of small-cap stocks to put in an index.

physguy1123 · on July 31, 2018

In my experience, writing java (or groovy here) in c++ results in horribly slow code which the jvm runs circles around, and it sounds like that's the problem your employee ran into.

> But for all the applications where the high performance code is in niches at the edge and there simply aren't resources or expertise to fully tune the native implementation

It's interesting you say this, because in my experience it's the JVM which requires absurd amounts of tuning and native programs which are much more consistent. The proper and easier way that native programs are written lends itself to fairly respectable performance, mostly because the object and stack model of say C or C++ is so much friendlier to the CPU than in most dynamic languages.

In general, for all that I hear statements along this line, I've only twice seen code to back it up, and the C was so de-optimized from the OCaml version that I suspect it was intentional - the author (same for each) was a consultant for functional languages, and in one case switched the C inner loop to use indirect calls for every iteration and in the other switched the hash function between the C and functional comparison.

shub · on July 31, 2018

In addition, a lot of the techniques used to write high-performance Java boil down to "write it like C". Avoid interfaces, avoid polymorphic virtual calls (as you can't avoid virtuals entirely), avoid complex object graphs, avoid allocating as much as possible...it's not nearly as nice as naive Java. Still nicer than C IMO. If your process segfaults you can know for certain that it's a platform bug.

zmmmmm · on July 31, 2018

The other thing that makes Java nicer than C is the ease and depth with which you can profile it to discover where the bottlenecks actually are. While it's certainly possible to profile in both cases, the runtime reflective and instrumentation capabilities of the JVM really add a lot of power to it.

repolfx · on July 31, 2018

There's this classic paper from Google that runs an optimisation competition on the same program written in C++, Java, Scala and Go:

https://days2011.scala-lang.org/sites/days2011/files/ws3-1-H...

physguy1123 · on Aug 2, 2018

This is great benchmark of the fundamental problems with say Java - the code itself is fairly simple and the JITs probably generate optimal code given their constraints, but the performance problems clearly show that the GC and pointer chasing really hinder your performance.

If you add in cases where simd, software prefetching, or memory access batching help, the difference will only grow.

weberc2 · on July 31, 2018

It’s not native vs VM, but rather “has stack semantics/value types” vs “no stack semantics/value types”. In particular, OCaml’s standard implementation is a native, not VM.

Also worth calling out Go, which is rather unique in that it has stack semantics but it also has a garbage collector, so it’s kind of the best of both worlds in terms of ease of writing correct, performant code.

pjmlp · on July 31, 2018

Go is not rather unique in having GC and stack semantics, there are plenty of languages that have it, all the way back to Mesa/Cedar and CLU.

weberc2 · on July 31, 2018

I should have been more clear I guess; I was comparing it to other popular languages. Few have value types and many that do (like C#) regard them as second-class citizens.

ernst_klim · on July 31, 2018

But go has an imprecise GC (in reference implementation) or stack maps (in gccgo), so the GC overhead is rather huge. It also lacks of compaction, so cache misses are not that good too.

weberc2 · on July 31, 2018

Not sure what you mean by imprecise, but Go’s GC does trade throughput for latency. The overhead still isn’t huge if only because there is so much less garbage than in other GC languages. I’m also surprised by your cache misses claim; Go has value types which are used extensively in idiomatic code so generally the cache properties seem quite good—maybe my experience is abnormal?

ernst_klim · on July 31, 2018

>Not sure what you mean by imprecise

It's a rigid term:

https://en.wikipedia.org/wiki/Tracing_garbage_collection#Pre...

perf shows how much time does GC eat, and that's quite a lot. Thus in the majority of benchmarks go lags behind java or on par with it at best.

>there is so much less garbage than in other GC languages

That is not true since strings and interfaces are heap allocated thus the only stack allocated objects are numbers and very simple structs (i.e. which contains only numbers), so you would have a lot of garbage unless you are doing a number crunching, which could be easily optimized by inlining and register allocation anyway.

weberc2 · on July 31, 2018

> It's a rigid term

Ah, neat! I learned something. :)

You’re mistaken about only numbers and simple structs being stack allocated. All structs are stack allocated unless they escape, regardless of their contents. Further, arrays and constant-sized slices may also be stack allocated. I’m also pretty sure interfaces are only heap allocated if they escape; in other words, if you put a value in an interface and it doesn’t escape, there shouldn’t be an allocation at all.

ernst_klim · on July 31, 2018

Both arrays and interfaces are heap allocated. Slice is just a pointer to a heap allocated array.

Structure could be stack allocated, but any of it's fields would not if there is anything but a number.

A trivial example:

https://segment.com/blog/allocation-efficiency-in-high-perfo...

    func main() {
            x := 42
            fmt.Println(x)
    }

    ./main.go:7: x escapes to heap

So a trivial interface cast leads to allocation.

weberc2 · on July 31, 2018

Looks like you're right about interfaces (full benchmark source code: https://gist.github.com/weberc2/87d2fdc379065a2765d1c9f490ad...)!

    BenchmarkEscapeInterface-4        50000000   33.3 ns/op  8 B/op  1 allocs/op
    BenchmarkEscapeConcreteValue-4    200000000  9.45 ns/op  0 B/op  0 allocs/op
    BenchmarkEscapeConcretePointer-4  100000000  10.0 ns/op  0 B/op  0 allocs/op

But arrays are stack allocated:

    BenchmarkEscapeArray-4  50000000   21.3 ns/op  0 B/op  0 allocs/op

And structs are stack allocated, as are their fields--even fields that are structs, slices, and strings!:

    BenchmarkEscapeStruct-4  100000000  12.8 ns/op  0 B/op  0 allocs/op

The code:

    type Inner struct {
    	Slice  []int
    	String string
    	Int    int
    }
    
    type Struct struct {
    	Int    int
    	String string
    	Nested Inner
    }
    
    func (s Struct) AddThings() int {
    	return s.Int + len(s.String) + len(s.Nested.Slice) + len(s.Nested.String) +
    		s.Nested.Int
    }
    
    func BenchmarkEscapeStruct(b *testing.B) {
    	for i := 0; i < b.N; i++ {
    		s := Struct{
    			Int:    42,
    			String: "Hello",
    			Nested: Inner{
    				Slice:  []int{0, 1, 2},
    				String: "World!",
    				Int:    42,
    			},
    		}
    		_ = s.AddThings()
    	}
    }

ernst_klim · on July 31, 2018

I'm sure your strings are not stack allocated, they are statically allocated (and would be statically alocated in any language). Not sure about arrays, but dynamic arrays should be dynamically allocated do, your arrays are static probably. They would be heap allocated, if you would use make.

weberc2 · on July 31, 2018

It doesn't matter whether they're stack allocated or statically allocated; neither is garbage, contrary to the original claim ("Go generates a lot of garbage except when dealing with numeric code"). The subsequent supporting claims ("structs with non-numeric members are heap-allocated", "struct fields that are not numbers are heap allocated", etc) were false (sometimes non-numeric members are heap allocated, but they're often not allocated and never because they're non-numeric and their container is never heap allocated on the basis of the location of the member data).

I think this matter is sufficiently resolved. Go trades GC throughput for latency and it doesn't need compaction to get good cache properties because it generates much less garbage than traditional GC-based language implementations.

ernst_klim · on July 31, 2018

>It doesn't matter whether they're stack allocated or statically allocated

It does. Any language could do static allocation, go is not different from java here, the problem is that in any real code nearly all your strings and arrays would be dynamic, thus heap allocated, as well as interfaces. Consider also that allocations in Go are much more expensive than in java or haskell.

weberc2 · on July 31, 2018

We're talking past each other. My claim was that Go doesn't need compaction as badly as other languages because it generates less garbage. You're refuting that with "yeah, well it still generates some garbage!". Yes, strings and arrays will often be dynamic in practice, but an array of structs in Go is 1 allocation (at most); in other many other languages it would be N allocations.

> Consider also that allocations in Go are much more expensive than in java or haskell.

This is true, but unrelated to cache performance, and it's also not a big deal for the same reason--allocations are rarer in Go.

EDIT:

Consider `[]struct{nested []struct{i int}}`. In Go, this is at most 1 allocation for the outer array and one allocation for each nested array. In Python, C#, Haskell, etc, that's something like one allocation for the outer array, one allocation for each object in the array, one allocation for each nested array in each object, and one allocation for each object in each nested array. This is what I mean when I say Go generates less garbage.

ernst_klim · on July 31, 2018

>Consider `[]struct{nested []struct{i int}}`.

A typical example, yeah. I've said about structs of ints already, it's not a common type unfortunately anywhere beyond number crunching, in which go sucks anyway.

In haskell you could have unboxed array with unboxed records. Check Vector.Unboxed.

weberc2 · on July 31, 2018

> I've said about structs of ints already

Yeah, but you were wrong (you said other kinds of structs would escape to the heap). The innermost struct could have a string member and a `*HeapData` member; it wouldn't matter. The difference in number of allocations between Go and others would remain the same. The difference isn't driven by the leaves, it's driven by number of nodes in the object graph; the deeper or wider the tree, the better Go performs relative to other GC languages.

> In haskell you could have unboxed array with unboxed records. Check Vector.Unboxed.

For sure, but in Go "unboxed" is the default (i.e., common, idiomatic); in Haskell it's an optimization.

yxhuvud · on July 31, 2018

Regarding your last point, Crystal has the same features as go in that regard, while at the same time being vastly more expressive. This mostly due to the standard library in Crystal being so nice for work with collections (which perhaps isn't surprising as the APIs are heavily influenced by Ruby). Blocks being overhead free is another necessary part for this to work well.

weberc2 · on July 31, 2018

Yeah, I often find myself wishing Go's type system were a bit better, but the reason I prefer it is because it's fast, easy to reason about, and the tooling/deployment stories are generally awesome (not always though--e.g., package management). So far I'm only nominally familiar with Crystal; I'll have to look into it sometime.

tybit · on July 31, 2018

.NET is another example of value types in a garbage collected language. It’s also somewhat unique afaik in doing so within a VM.

weberc2 · on July 31, 2018

Definitely. I’m sad that they’re not more idiomatic in C#. I definitely prefer values and references over OOP class objects.

physguy1123 · on July 31, 2018

SIMD, multicore, and caches don't just magically happen with better compilers. SIMD requires very specific memory access and computation patterns, and cache-friendly code has similar restrictions. The features of even basic javascript fly in the face of code being simd and cache friendly except for the most trivial programs.

Automagically parallelizing general serial code is something that isn't feasible on any hardware similar to modern cpus and probably will never mesh well with fast single-threaded performance (communication and synchronization in hardware is HARD)