Hacker News new | past | comments | ask | show | jobs | submit login
Chisel – A fast TCP tunnel over HTTP (github.com/jpillora)
101 points by aus_ on Jan 9, 2017 | hide | past | favorite | 38 comments



Why TCP over websockets? You can just use the HTTP bodies as a stream in both direction. Which means the proxy just has to strip or add HTTP headers before forwarding. The overhead afterwards is 0 -> you just write to the socket.


Hmm, this choice is indeed strange, websockets still are blocked in some restrictive set ups (squid?).

But still, what is the way of doing stream in both directions? Do you mean opening multi-part form data for uploading and transfer encoding chunked for download? But that would be 2 tcp connection for 1 tcp tunnel. And I believe there's no other way to do it without the overhead of HTTP request/response headers.


Technically HTTP represents a bidirectional stream of arbitrary data in both directions, which follows a set of headers and is optionally finished by a set of footers. This is for example quite obvious when you look at the HTTP/2 specification. There is no need for the bodies to be sent in a particular order (response body after request body) or in a particular encoding (form-data, chunked, SSE, etc.). It can be two arbitrary byte streams.

This functionality is exposed by lots of HTTP libraries, e.g. the Go, node.js or C# HTTP libraries will allow to simply read or write to a request/response body stream just like to can write to a socket. As a proxy - just copy the data from one stream to the other.

What are are probably thinking is that the browser environment does currently not allow to use HTTP for arbitrary streaming. Instead they defined some fixed use cases and encodings for these use cases. If you use SSE encoding you can't send bytes without overhead anymore and can only do streaming in one direction. But: You will get a nicer browser API for retrieving the data. Using HTTP bodies for arbitrary streaming will be allowed in future revisions of the fetch API (which e.g. give you a ReadableStream).

Other HTTP libraries also do not allow for streaming but expect the body in either direction to be a fixed length thing. This will allow to represent the body e.g. as a "string" or "byte[]" instead of a Stream from which the application has to read. Thereby the API gets simpler, but not all possible use cases are enabled.


It's true that underlying tcp connection can be reused for virtually anything and it doesn't have to be conformant with having http requests/response. But it would probably get blocked by a proxy which wouldn't be able to pass it down the line.


Not necessarily ... you would have to issue a HTTP request per uplink chunk but HTTP can use connection pooling so that does not necessarily translate to a single TCP connection [0]

Agreed it's not quite as straightforward as the parent poster suggests. I can see issues with this approach for realtime/streaming applications but for applications relying on a similar request/response flow of traffic it would do the job.

[0] https://en.wikipedia.org/wiki/HTTP_persistent_connection


Yes, and for various reasons those chunks can be reordered. And a misbehaving proxy might also try to cache them. And to send data back from the server you need long hanging gets, which can also be subject to timeouts and weird chunking by misbehaving proxies.

These problems are all solvable, but you need to treat http requests like datagrams and (basically) reimplement TCP on top of HTTP.

We've done this several times now. I made one[1] myself a few years ago based on google's browserchannel implementation (that was first written for gchat inside gmail. It supported browsers down to IE5.5). But the best is probably SockJS - https://github.com/sockjs . IIRC Its written by some (ex?) vmware guys, and its great.

But all this stuff is pretty outdated now. Misbehaving corporate proxies are (thankfully) getting much rarer - especially if you tunnel your traffic over HTTPS.

These days you should just use websockets directly.

[1] https://github.com/josephg/node-browserchannel


Isn't connection pooling just using several connections to download one resource using content-range?


Connection aggregation?


I remember trying run SSH (the current TCP multiplexing protocol) over straight HTTP and found there were issues so I added WS to the mix. There probably is a way to do it though it does the job as is.


Hold on, hold on, hold on. Let me get this straight: you took an application layer protocol (HTTP) that runs on top of TCP and ...reimplemented TCP over it!?

(see: https://www.quora.com/What-is-the-difference-between-HTTP-pr...)


Pfft. Amateurs!

> Once upon a time at evening I have decided to properly brake the famous browsers communication problem and as a result you have landed in here. I have created implementation of BNC networks model with simple TCP/IP layer, that as transport packet it will use browser's cookie object.

http://theprivateland.com/bncconnector/index.htm


No, TCP has not been reimplemented.


Looks to be TCP over websockets; which isn't really that interesting IMO.


Yep, because many firewalls ALSO block websockets; so comparing it to crowbar for speed isn't terribly fair, as crowbar only uses POST/GETs.


They can't block websockets over https.


Most corp+school firewalls block https or make the user add a MITM cert (well, it's managed via active directory for company machines). Rarely do these firewalls allow websockets.


I don't think that any company can still block https, there are not many pages left you could use.

Installing a MITM is probably more common. Although I don't understand the benefit of then blocking ws, if you can read the traffic anyway?


How common is this, actually? My impression is its fairly rare, but I'd love to see some actual analytics if anyone has some.


K-12 schools definitely block HTTPS at some places, mostly because they can't inspect it.


Wow, so they can only use HTTP? That seems crazy. Would be fun to try and mess with the network though. Yeesh.


Not common anymore. Now it's usually an ssl proxy that uses a cert on all computers to mitm the connection and enforce policies.


How does this deal with double congestion control?


I was asking myself the same thing. TCP implements many RFCs for congestion control, flow control, etc. Many of them might be redundant if everything is being sent over HTTP (over TCP).

I would use "link conditioner" or a similar tool, simulate packet loss and see how this compares to the other software.


http://sites.inka.de/sites/bigred/devel/tcp-tcp.html

Even for applications designed to tunnel traffic from the outset (OpenVPN), TCP over TCP is a mess and it's kind of a non-starter for anything that's not a toy (unless you have no other choice, like with certain mobile carriers where path MTU and CGN cause issues).


That's very interesting, thank you. I think there should be a way to control retransmission times on the client that would alleviate that problem. Probably taking out tcp connection out of kernel space and controlling the protocol in user space will allow to control the lower level tcp to work nicely with upper levels. But still, is there a possibility of a meltdown on the routers further down the line which might be controlling tcp connections? They would be routers which are not simply passing the ip packets, but working with the protocol in a more elaborate way though, either deep packet inspection or some other network mechanisms?


It doesn't. I'd like to support UDP at some point, maybe rewrite using https://github.com/xtaci/kcp-go.


Annoying that this has the same name as https://github.com/ucb-bar/chisel3


Sorry! Was originally a rewrite of https://github.com/q3k/crowbar so I chose a synonym.


TireIron. ;- )


Yup. Same thing with Spark Framework wrt Apache Spark.

Seems like a curse targeting game-changing projects written in Scala.


And no reference to the standard mechanism for HTTP tunnels, which has only existed for 18 years.

https://tools.ietf.org/html/rfc2616#section-9.9


Many superlatives, which immediately raise suspicions.

If you are trying to solve the problem of NAT traversal and such, I suggest you rather attempt to do this:

https://en.wikipedia.org/wiki/TCP_hole_punching


this surely is great and i can't wait for a moment when someone finally comes up with a TCP Hole Punching as a Service.

vpnazure kinda does this, but has the overhead of softether vpn service on top of it... I would rather go with punching myself an ssh port.


I do it at Wormhole[1] in a very similar fashion to "vpnazure"; also with SoftEther.

Why would you rather "punch yourself and ssh port"? Do you mean that your main problem with vpnazure and the like is the need of an agent/client software installed? I am not sure I understood your concern, but I would be very interested in hearing more about it. Feel free to email me to the address on my profile if you prefer, although a reply here works for me.

IMHO the best way to make it transparent for any application is to have a virtual interface. It offers an expected environment for any new or legacy app (instead of proxying stuff explicitly).

Thanks!

[1] https://wormhole.network


That's seems like an awesome service, thank you for the work!

> main problem with vpnazure and the like is the need of an agent/client software installed

Yes, and at the same time vpnazure creates a vpn set up (I think it's server/client set up, not bridge), while for my simple usage I only need a single TCP connection through which I'd work with SSH. And from this point, I could spawn a ssh tunnel and forward the needed traffic through it, alleviating the need of vpn. Maybe it is a really basic use case, but for work/home environment I find it to be the thing I actually need, and not the full blown vpn.

Another problem with vpnazure I had is that I'd have no way of seeing where the traffic flows inside the vpn interface. Thinking about it now, probably could be seen in traceroute. But at the time, I thought about looking into tcpdump of vpnserver or setting up a firewall. And that was too complicated for my hobby set up. The point of my concern was that I wanted to see whether any traffic is leaking onto third party servers managed by SoftEther or otherwise. Of course I'd want the traffic to flow across the internet, but I'd expect it to take the path it would take if one of the nodes was a natural server.

Also, all the traffic is managed by vpnserver (softether's one), which makes it a little opaque in terms of where the packets go out of that process.

Of course a client would be inevitable for any hole punching SaaS, but preferebly I'd like if it'd only run during the connection establishment period of time.

That's my 2 cents, coming from a personal set up.


Hi! thanks for sharing your opinion!

I think what you're looking for is https://ngrok.com/ - it's quite popular among developers.

> Another problem with vpnazure I had is that I'd have no way of seeing where the traffic flows inside the vpn interface. Thinking about it now, probably could be seen in traceroute. But at the time, I thought about looking into tcpdump of vpnserver or setting up a firewall. And that was too complicated for my hobby set up. The point of my concern was that I wanted to see whether any traffic is leaking onto third party servers managed by SoftEther or otherwise. Of course I'd want the traffic to flow across the internet, but I'd expect it to take the path it would take if one of the nodes was a natural server.

> Also, all the traffic is managed by vpnserver (softether's one), which makes it a little opaque in terms of where the packets go out of that process.

I see what you mean and I understand your concerns.

If you'd like to see the path from your connection to the VPN servers you can always do a traceroute to its public IP. However for concerns regarding what they do at the server level, if a 3rd party manages the VPN server, you only have your trust in them and their degree of transparency. Next step would be to go self-hosted, but then you need to trust the hosting provider too.

Thank you!


chisel is for bypassing firewalls, not for solving NAT traversal


Is there any difference with corkscrew? Either way, really keen to test the perfomance of it for getting out of restricted proxies where corkscrew saves me life.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: