More

sonthonax · 2025-03-16T08:59:32 1742115572

Just not true. Lots of oil companies have a vested interest in renewables now. A lot of the infrastructure for offshore oil is being redeployed for offshore wind.

sonthonax · 2024-11-04T17:36:44 1730741804

The New York Times is a glorified blogging platform. Not to long ago it was a Wordpress site.

I'm fully aware of how jaring it is for the median HN reader to hear this, but maintenance of a news website isn't the kind of skilled labour that commands a 250k a year paycheck anymore.

FredPret · 2024-11-04T18:17:03 1730744223

Perhaps but they do serve their blog at scale, including video and interactive widgets. They’re the most popular of the news blogs.

It’s not rocket engineering but it’s not nothing.

sonthonax · 2024-07-23T18:43:37 1721760217

Linux outages and upgrade fuckups happen all the time. The difference is that Linux isn't on a unified upgrade cycle so issues come as a trickle rather than a dulge, so the problems are usually localised to a corporation and don't make the news.

Also Linux fanboys will usually blame the system admin for not configuring things properly if things break: "it's not the operating system, it's <something stolen from OpenBSD>".

End of the day Linux is only popular because of the inertia UNIX had on mini-computers/servers. For standard end users GNU Linux is lightyears behind Windows and macOS in terms of usability and stability.

sonthonax · 2024-06-29T20:53:47 1719694427

Wait until you learn that they believe they need Kafka. Their engineers are probably bitter they work at a media company and not a FAANG.

https://www.confluent.io/en-gb/blog/publishing-apache-kafka-...

> The Monolog is our new source of truth for published content. Every system that creates content, when it’s ready to be published, will write it to the Monolog, where it is appended to the end.

> The Monolog contains every asset published since 1851. They are totally ordered according to publication time. This means that a consumer can pick the point in time when it wants to start consuming. Consumers that need all of the content can start at the beginning of time (i.e., in 1851), other consumers may want only future updates, or at some time in-between.

> As an example, we have a service that provides lists of content — all assets published by specific authors, everything that should go on the science section, etc. This service starts consuming the Monolog at the beginning of time, and builds up its internal representation of these lists, ready to serve on request. We have another service that just provides a list of the latest published assets. This service does not need its own permanent store: instead it just goes a few hours back in time on the log when it starts up, and begins consuming there, while maintaining a list in memory.

Absolutely insane. The only reason this works is that the NYT publishes less than 300 articles per day so you can get away with doing un-indexed full table scans of your entire database. But the engineers can put "I created a log based time-series architecture on their resumes".

sonthonax · 2024-06-10T08:23:57 1718007837

Thanks for the thorough response.

But firstly:

> If your firm has had KDB for ages there's a good chance it's big enough to be signed up to one of the research groups who maintain a set of test-suites they will run over a vendor's latest hardware offering, letting them claim the crown for the fastest Greeks or something. If your firm is a member you may be able to access the test-suites and look at how the data in the options tests is being written and read, and there are quite a few, I think.

Unfortunately my firm isn't that big ~150 in total and maybe about ~40 developers, if which there are 2 full time KDB devs who's job is mostly maintaining the ingestion and writing some quite basic functions like `as_of`. There's only two people who work on our options desk as developers, so there's a lack of resourcing for KDB. When I have these issues with KDB around performance, it's quite hard to get support within my firm from the two KDB devs (one of which is very junior).

> I've never worked on options data and so can't opine on the problems it presents

The thing about options data is that it's generally lower frequency but a lot more complex. If spot data is 1 dimensional, and futures data is 2 dimensional, options are 3 dimensional. You also have a lot more parameterizations which leads me to the second point :)

> you may not want to load data out of KDB a day at a time. Try to do the processing in KDB

Just to give you a very specific example of the processing I need to do. I have a data structure in KDB like this (sort of typescript notation):

     row = mapping<datetime, { a: number, b: number, mu: number: sigma: number, rho: number}>

This is vol surface. To convert that into volatility requires:

    f = log_moneyness - m;
    total_var = a + b * (rho * f + (f * f + sigma * sigma).sqrt())
    vol = total_var / time

Then in order to calculate the log_moneyness I need to calculate the forward price from an interest rate which is slightly more trivial.

Now I have a base in which I can start generating data like the delta, but this also requires a lot of math.

I was pulling this stuff out of KDB because I already had my code in rust that does all of this.

> You said you're scared to do queries that return a lot of data, and that it often freezes. Are you sure the problem is at the KDB end?

Yeah, I'm pretty sure in my case. We have some functions designed for getting data written by the KDB guys. Even for functions that return 30 something rows, like an as_of query it takes ~10s.

rak1507 · 2024-06-10T17:38:36 1718041116

The volatility calculation looks like it should be doable in q/k, I'm not sure about the more complicated stuff but at the end of the day it's a general purpose language too so anything is possible. KDB being columnar means thinking in terms of rows can often be slower. Sounds like you have a keyed table? If the KDB guys you have aren't that good/helpful you could check out some other forums. Could be useful for the future to be able to circumvent issues you have with the kdb devs.

sonthonax · 2024-06-10T07:52:25 1718005945

What did you do for your backtesting testing?

sonthonax · 2024-06-09T19:08:59 1717960139

So I’ve actually contributed to this project.

My concern with adding custom libraries into KDB, while better than writing duplicative Q code is the maintenance nightmare of keeping them up to date in KDB.

It’s still an investment but I need to be aware of of the risks and downsides.

steveBK123 · 2024-06-09T19:24:50 1717961090

Loading your rust code into your existing KDB data lake and periodically updating it will be a significantly smaller lift than rewriting your data lake.

It sounds like you are some sort of Quant Dev on a desk, and so it's really up to you what you want to do. If you push against the grain to do a data lake rewrite, you'll own the time/effort/outcome of a big Data Engineering project. So you better be very right and also very fast.

If you are looking for solutions within your existing data lake, I've posted up a few sources / thoughts for you to get on and do your Quant Dev work.

You sound very set on some sort of rewrite, so you should do what your heart desires. Just make sure you deliver value to your desk.

sonthonax · 2024-06-09T19:28:23 1717961303

> Loading your rust code into your existing KDB data lake and periodically updating it will be a significantly smaller lift than rewriting your data lake.

This is probably going to be what I do until KDB creaks over.

> You sound very set on some sort of rewrite

I vacillate between the two things. I'm personally used to data engineering with parquet and spark, which are widely used outside of finance, and don't have expensive vendor lock in.

And then I realise that I'd have to own this stuff, and my job isn't a data engineer, and I'm a quant dev.

sonthonax · 2024-06-09T19:00:45 1717959645

If it’s partitioned this should be even faster.

sonthonax · 2024-06-09T18:49:41 1717958981

> What does "move the code to the data" mean in practice?

This grasps at why I’m finding KDB so hard to use. I’ve written a pricing and risk library in Rust. Historical data really needs to be taken processed in Rust rather than KDB.

sonthonax · 2024-06-09T18:41:40 1717958500

> Unless there's some significant new data that might change that decision, it is what it is.

This is what I’m grasping at.

Are the challenges of writing a KDB system for back testing derivatives data which needs to work in tandem with a rust pricing library substantially different from a KDB engineered for back testing spot data.

One has complex and specific math and one doesn’t.

michaelg7x · 2024-06-09T21:13:10 1717967590

I'm not sure if anyone's yet suggested that you embed your code as a library for KDB, that it could load dynamically? There's some pointer walking fun involved which Rust may _hate_ but it's not that hard and after that you'd be left with a the numerical arrays you're interested in.

mrj · 2024-06-10T16:53:25 1718038405

It sounds like you're not given the tools to do your job