Show HN: Staffjoy V2, now open source

philip1209 · on Feb 27, 2017

I announced two weeks ago that we are shutting down Staffjoy [1] and open sourcing our code. Last week, our primary V1 repo was submitted to HN [2]. Yesterday, I open-sourced our V1 microservice Chomp for computing shift scaffolds from forecasts [3]. I also also published our YC Fellowship application [4] and pitch decks [5] on our blog.

Today we opened the V2 repo of Staffjoy, which contains a microservice architecture, detailed here [6]. V2 was a monorepo, meaning that all of the code is in one repo. This is the largest and most sophisticated of our repositories.

You can learn about the difference between V1 (autoscheduling of hundreds of workers) and V2 (Excel replacement with text messages for small businesses) in our design journey blog post [7].

We are continuing to open source our last repos, which are mainly V1 microservices. Specifically the V1 Cron microservice, V1 mobile applications (iPhone + Android in React Native), V1 Scheduler microservice (our first scheduling algorithm), and the V1 Mobius microservice (which solves an assignment problem) will be open-sourced in the coming days.

If you have any questions - please let me know! I'm providing contract support for anybody wishing to deploy, modify, or customize Staffjoy code through Moonlight [8].

[1] Shutdown announcement: https://news.ycombinator.com/item?id=13647382

[2] V1 "Suite" https://news.ycombinator.com/item?id=13730488

[3] Chomp service https://github.com/staffjoy/chomp-decomposition

[4] YCF Application: https://blog.staffjoy.com/staffjoys-yc-fellowship-applicatio...

[5] Pitch decks: https://blog.staffjoy.com/staffjoys-pitch-decks-that-raised-...

[6] V2 Architecture: https://blog.staffjoy.com/staffjoys-v2-architecture-9d2fcb40...

[7] V1 to V2 Design Journey: https://blog.staffjoy.com/staffjoy-v2-ca15ff1a1169#.ttejeklw...

[8] Contract support - https://www.moonlightwork.com/staffjoy

bsbechtel · on Feb 28, 2017

Phillip, I'm sure going through all this as a part of your shutdown isn't easy, but it's really great that you're taking the time to do it. I'm sure there are many who will benefit from making all of this open source. Many thanks and best of luck going forward!

philip1209 · on Feb 28, 2017

Thanks! Please reach out if I can be a resource too.

tarr11 · on Feb 28, 2017

That's a really pitch great deck, Philip. Thanks for sharing.

I have to admit - I don't totally understand why you pivoted for your V2. Couldn't you have just added on the "SMS" feature to your existing product?

philip1209 · on Feb 28, 2017

We had some failed deploys due to usability issues. We also felt like we were starting to converge to the same set of features similar to a bunch of other companies (Homebase, WhenIWork, Humanity/Shiftplanning, Deputy, etc). We decided that we needed to do a lot more user research in order to understand customer needs, and others' products. (We also started to realize that, to do algorithmic scheduling, we would need a portfolio of algorithms - not a generic one - which was a tough job for a two-engineer team).

Partly, we decided to try to just focus on a great scheduling experience rather than build lots of features quickly. So, the SMS came out of the user research and a need in the market.

Our strategy was going to be integrations. We had a slew of API integrations lined up for POS, payroll, and other providers.

tedmiston · on Feb 28, 2017

Thanks for open sourcing this.

Just a heads up — your phone numbers are in the sms service.

philip1209 · on Feb 28, 2017

Ugh, I don't think it's worth force pushing to erase. I pushed a commit to remove them, though. Thanks.

tixocloud · on March 1, 2017

Thank you so much for sharing all that you've learned throughout your journey and for sharing your architecture.

Curious but with a working web app, what was the motivation for building an external REST API?

hoodoof · on Feb 28, 2017

In hindsight, what would you have done differently with the architecture of this software?

It's got alot of moving parts - do you think it is overengineered?

philip1209 · on Feb 28, 2017

It's overengineered for today. The datastore should have been a monolith. However, splitting up accounts and companies made us think in terms of microservices. We were prepared to add a bunch of other messaging services on the backend behind bot, so I think that architecture was forward looking. Faraday was my favorite piece of software there.

troyk · on Feb 28, 2017

We are sharing your sentiment that a lot of our user-facing errors could be limited by moving away from weak typing (current stack is Ruby). In hindsight, did Go and protobuf prove well, we are evaluating Go and possible Elixir as a sweet spot between static and dynamic typing.

philip1209 · on Feb 28, 2017

Yes. I would always use gRPC-gateway in the future. It's an amazing library. Check out the `protobuf` folder - we basically defined the whole api there in a language-agnostic way. For client libraries, you can autogenerate gRPC definitions quickly. For servers, gRPC-gateway provides a Swagger definition that meant that we always had 100% up to date API docs that include a Postman-style way to execute calls. The auto-generated API docs sped up front-end development so much.

I think that access to the docs requires logging in, so here are some screenshots: http://imgur.com/a/R0AvB

Jason Chen is using Elixer/Phoenix for his new project, by the way, and thinks that it's the future for Rails developers.

troyk · on Feb 28, 2017

gRPC-gateway looks very attractive. Thank you for posting this for the rest of us to learn from, a brilliant effort for sure.

philip1209 · on Feb 28, 2017

Its documentation is horrible for getting started. Lmk if you have issues.

spacetexas · on Feb 28, 2017

What are your thoughts on the re-write in hindsight, would you have done it differently looking back on it? Perhaps have gone for a more 'monolithic' approach?

philip1209 · on Feb 28, 2017

Yes, in hindsight I would have done more of a monolith datastore with fewer services. Faraday for auth was amazing - we could put anything we wanted behind it and centralize authentication/authorization. I liked how configurable Kubernetes was for everything from secrets to environments . Managing service discovery errors was annoying. Monorepo was a big win - I would always do this in the future.

Dowwie · on Feb 28, 2017

To what extent did you try to refactor your Python code/architecture prior to adopting Go for v2? Also, could you share one example of the dynamic typing related problems? Specific metrics would be helpful to understand v1 limitations. Thanks for considering my question!

philip1209 · on Feb 28, 2017

I talked about this a little bit more on reddit [1] - basically, running Python code provided a lot of challenges.

We did a lot of design research prior to starting the V2, and we realized that we needed to make huge changes in our data normalization. For instance, in the V1 - we assumed that workers could only have one "role", but we realized that people can rotate between many jobs often. Making this change to the API while maintaining support for existing users would have been really difficult.

So, we basically started looking at all of the work we would need to do for a data normalization change, and said "well wait - if we were to build it from scratch, how long would that take?" That conversation also made us realize that we could jettison a lot of unneeded features if we started a V2.

We chose Go to address some of the issues that we had with Python, including ease of deployment, ease of running and lack of static typing.

[1] https://www.reddit.com/r/Python/comments/5w3o6t/staffjoy_v1_...

hoodoof · on Feb 28, 2017

Real-world software can be messy because sometimes you just have to get stuff done within time constraints.

philip1209 · on Feb 28, 2017

Yes. Protocol buffers and grpc-gateway made it easy to scaffold out a quick api, write all the interior logic, and know that there would not be any major issues due to strong typing. We used Gogoproto to annotate the protobuf files so that database crud is easy too.

dragonsh · on Feb 28, 2017

Having worked with flask+Python eco-system and golang do you think if you continued with Python for v2 it might have been better (since you were already familiar with Python even though golang is statically typed)?

Also do you think Python 3.6 static checking with mypy would have eliminated need for go?

Another question is for algorithmic scheduling why you didn't use Python libraries but Julia, given your v1 was written in Python?

philip1209 · on Feb 28, 2017

No, I was having issues with too much "magic" in Python libraries. I'm continuing to use Go for my new company.

In particular, running SQLAlchemy was a nightmare (see a reddit comment I made about it here [1]). I also wanted a more parallel language - I was tired of having to use queues to do parallel tasks. Go's static typing made everything easier, from auto-generating Swagger definitions to having usable code documentation via Godoc. Most of our runtime errors while using Python were due to type mismatches that could have been caught by a compiler.

I'm unfamiliar with Python 3.6 static checking. I'll have to look into it.

For the algorithmic scheduling - our algorithms in Julia predate the existence of a web app. We had customers using the Julia-based algorithm using the protocol "spreadsheets over email". The website came later, and we ended up switching the algorithms from Julia to Python after the web-app launch [2].

[1] https://www.reddit.com/r/Python/comments/5w3o6t/staffjoy_v1_...

[2] https://blog.staffjoy.com/retro-on-the-julia-programming-lan...

dragonsh · on March 1, 2017

Thanks for the reply and insight.

Why you had issues with magic in Python libraries, as I understand one of the main design goals of Python and its libraries is "Explicit is better then implicit" PEP-20 [1].

Indeed for our own project we chose python instead of ruby and ruby on rails just because of this PEP-20[1].

Your blog and moving to go alarmed us if there is anything wrong with python eco-system.

We tried go and due to too much boilerplate code and quality of the libraries we dropped it and when it comes to interacting with database especially postgreSQL all the speed advantage of go is not very useful as most tasks were IO intensive. Also the database driver in golang for postgresql is not as mature as psycopg. For CPU intensive tasks we relied on scientific python which is written in C and it worked well with multiprocessing and asyncio in python.

Also for the specific error you encountered in race conditions on update with multiple request. We solved it by using SQLAlchemy with_for_update()[2] which actually takes care at database level to make sure transactions are properly handled. As database we were using is postgreSQL, we wanted the database to handle ACID compliance, not implement in application code.

[1]https://www.python.org/dev/peps/pep-0020/

[2]http://docs.sqlalchemy.org/en/latest/core/selectable.html#sq...

kondro · on Feb 28, 2017

Interesting to see how other companies' application stacks end up looking. Thanks for releasing this.

zackify · on Feb 28, 2017

Their React app definitely has a lot that could be done better. It is nice seeing what other people come up with in big orgs and learning from what they have written. Sometimes I have imposter syndrome to the max but this sort of codebase I can actually read

philip1209 · on Feb 28, 2017

Yes, and we had a bunch of issues getting javascript to work with Bazel. We didn't realize that our components would be split up. Caused a lot of problems and needed a rewrite.

zalmoxes · on Feb 28, 2017

Their Go services definitely read like a very beginner or prototype project. There are no tests, lots of global state, and little separation of concerns between request/response business logic etc.

philip1209 · on Feb 28, 2017

Yes, it was a rush job. It started off nice, with tests and stuff. Then, we had to hit a deadline. I wrote 100% of the backend code. There were no other authors. Getting tons of functionality written without time to go back and refactor or improve as I improved my knowledge sucked, plus there were a lot of other non-code responsibilities at the time. If I had time, I would have spent a lot more time improving this code. For instance, I discovered gRPC interceptors way too late :-)

I recently started contributing to Buffalo, the Go web library by Mark Bates, and it addresses a lot of the issues we had, such as managing webpack for development environments.

Fun fact: I wrote a quick scraper in Go to relearn it before jumping in to the V2, and Francesc from Google did a code review of it for the first "Just For Func" episode. That made me realize just how much I had to improve my Go code: https://www.youtube.com/watch?v=eIWFnNz8mF4&t=2s

philip1209 · on Feb 28, 2017

> I really enjoyed that episode :)

> Curious, since it was just you writing the backend, what was the reasoning behind doing microservices + react SPA. Both of those require a lot of commitment, and coordination. Microservices especially are something I'd be more likely to consider with a very large team/many teams instead of a single developer.

> Would you choose a microservice architecture again?

For microservices - we were planning integrations with messy systems. Think, having to poll an API, deal with XML, or have to do custom auth on lots of different incoming data endpoints. Microservices made it easy to add new experiments (like our ical service) easily without adding complicated logic to the central datastore. For reference, we already had API keys and signed agreements with four integration partners (POS, HR, etc) that we wanted to roll out in a short period, with more people lining up. We also wanted to add messaging capabilities beyond SMS quickly, so that's why we architected the bot like that - where some providers had strict sending limits (like twilio) but others did not require as much of a strict queue.

For react - I didn't pick that and I've actually completely unfamiliar with the library. I'm learning VueJS now and that would be my choice for future SPAs. Now that I understand how shared components work, I would have changed a lot of the system design.

I'm already starting on my next company. It's a monorepo with three folders: pkg (packages), cmd (any `package main` commands), and static (all JS, SCSS, etc that gets built by Webpack then wrapped up in a single bindata.go). The primary app is a monolith. However, I'm making it possible to add additional service, such as cron jobs in separate containers or a command line utility based on the same protobuf definition.

zalmoxes · on Feb 28, 2017

Thanks for being so transparent and sharing your learnings! Good luck with your future project.

zalmoxes · on Feb 28, 2017

I really enjoyed that episode :)

Curious, since it was just you writing the backend, what was the reasoning behind doing microservices + react SPA. Both of those require a lot of commitment, and coordination. Microservices especially are something I'd be more likely to consider with a very large team/many teams instead of a single developer.

Would you choose a microservice architecture again?

philip1209 · on Feb 28, 2017

It was pretty raw. We didn't have many full-time people concentrating on it by the end - lots of contractors. We needed to spend a lot more time on tooling.

OutsmartDan · on Feb 28, 2017

Hi Philip, does your app build really take 20 minutes?

philip1209 · on Feb 28, 2017

The Bazel build system [1] is the open-source version of Google's internal build system. It's tough to set up, but once you do - it caches all builds, down to the docker container generation. On changes, it analyzes affected upstream projects and selectively rebuilds them. So, after an initial build, rebuilds are blazingly fast. In fact, the simple act of creating a pull request means that most production builds/deploys come straight from cache.

Here [2] is a screenshot of the actual "master branch test, lint, build, and deploy 15 containers to kubernetes" job from our Jenkins. It took about 10 minutes on the build box every time.

That being said, in order to make builds work with Bazel, we had to do some wonky stuff. We committed built Javascript files, and we had to manually commit a lot of other types of data files (like bindata.go and protobuf outputs).

[1] http://bazel.build

[2] <I'll drop this in as soon as Imgur/S3 is back online . . . >

philip1209 · on Feb 28, 2017

[2] https://i.imgur.com/cpeld6C.png