Tuesday, December 7, 2010

ByteString support in network

Not so long ago, I merged the network-bytestring package into the network package. This addresses two problems with the old network API:

  • Performance: String is implemented as a linked list of integers, which isn't a very efficient way to store binary data. ByteString was designed with efficiency in mind and has an memory overhead of just a few words per string, which is acceptable given a typical network message size of e.g. 4096 bytes.

  • Correctness: String is for storing Unicode text, not binary data. This can lead to subtle errors if you send text over the network in an encoding other than ISO 8859-1.

The new API exposes two new modules: Network.Socket.ByteString and Network.Socket.ByteString.Lazy. Both define ByteString versions of the different variants of send and recv.

While not formally deprecated, I would advice against using the String based recv and send functions in new code for the reason given above.

In addition, the new modules also add support of scatter/gather I/O. Scatter/gather I/O allows you to e.g. send several small chunks of data with one system call, minimizing the number of context switches, and without first concatenating the data, avoiding unnecessary user space data copying.

16 comments:

  1. Hi Johan,

    thank you for this nice package. I have been trying recently to do a simple transparent tcp proxy with Network.Socket.ByteString, much like Chris Done Throttle package ( https://github.com/chrisdone/throttle ).
    While the code ended up to be clean (thank to your package API), the performance was not as good as I expected. Namely, the CPU was the bottleneck, while it should really have been the wire IO.
    I'm still on GHC 6.12, so it might simply be an IO manager problem.

    ReplyDelete
  2. That's surprising. network is just a thin layer on top of the OS system calls. How many simultaneous open connections do you have? Have you tried CPU profiling to pinpoint the culprit?

    ReplyDelete
  3. Paul, Johan and I do a lot of network development and benchmarking, and we've not seen a problem like yours. There are no performance problems with the IO manager in 6.12, they're all scaling problems with lots of connections or thread sleeps.

    ReplyDelete
  4. Interesting. When are you going to deprecate the String-based API?

    ReplyDelete
  5. Dean, it's likely to be a long process. You can read the discussion here: http://thread.gmane.org/gmane.comp.lang.haskell.libraries/14241

    ReplyDelete
  6. When I use connectTo from network, will I end up with a ByteString or String Socket? Does it depend if I use the latest version of network? I am already using ByteString hPut to put the data into the connection handle.

    ReplyDelete
  7. Whether e.g. a recv on a Socket will return a ByteString or String is not a property of the Socket, but of the function you call. If you call Network.Socket.ByteString.recv you will get a ByteString and if you call Network.Socket.recv you will get a String. I recommend using the former.

    I working with Sockets instead of Handles. In my opinion, Handles conflate to many concepts (e.g. buffering, Unicode, newline conversion) into one data type. Also, Handles don't work correctly with UDP as they raise EOF for zero length messages, which are perfectly valid UDP messages.

    ReplyDelete
  8. your network link points to network-bytesting.

    so if I understand correctly, using Network.connectTo will use the String send/recv functions. I don't necessarily want a Handle back from connectTo, but I want to use an interface that is at that level- just giving the arguments for the host and portnumber.

    Is there an existing technique for this? Or can we make a Network module that does use the ByteString send and recv?

    ReplyDelete
  9. (Fixed the package link.)

    I intend to add a Network.Socket.connectTo function that's identical to Network.connectTo, but returns a Socket instead of a Handle. Stay tuned.

    ReplyDelete
  10. Johan, Bryan, thank you for your interest.

    As I said, I wanted to experiment with a simple tcp transparent proxy, much like https://github.com/chrisdone/throttle but even simpler. If one of you could provide a minimal implementation of such a tcp proxy, designed for good performance, I could bench it again. I have done an other implementation in an other language, monothread and epoll-based, with the EventMachine library. I am impressed by its performance, although I don't appreciate particulary the explicit event-driven programming required.

    I'd be happy to bench it against a reference simple Network.Socket.Bytestring one, if you have time for a proposal.

    Regards,

    ReplyDelete
  11. Paul, out of curiosity (and ignorance). Does such a proxy require an asynchronous DNS resolver to be efficient?

    ReplyDelete
  12. Johan, that is indeed a good point. With explicit event driven programming, and monothread event processing, every IO instruction should be asynchronous. Otherwise, the whole system can stall. Pure computations are usually synchronous, but predictably short.
    DNS resolving falls into the first categrory, but because of the very few adresses I have to proxy to (my use case is a reverse proxy), I do synchronous DNS and rely much on the OS cache.

    I propose a very basic benchmark consisting of downloading an distro ISO from 50 concurrent connections passing through the proxy. That should reduce the impact of requests handling and DNS resolution.

    ReplyDelete
  13. Paul,

    Benchmarks are always welcome. Especially ones that are easy to run. I don't have time to write any at the moment but if you have one that shows performance issues I can try to find some time to look into it.

    ReplyDelete
  14. Johan,

    here is a simple implementation highly based on chris done package :

    https://gist.github.com/738025

    Tune the remote host and port if needed then start it. Then, start 50 wget on http://localhost:8000/PATH-TO-AN-ISO-ON-REMOTE-SERVER

    On my machine, the proxy will reach 100% of CPU usage at approximatively 25 connections. Below 20 connections it seems to feel better and only use 10% of the cpu time, which is already a lot for the job :)

    ReplyDelete
  15. Thanks for the code Paul. I'll try to have a look at it whenever I have time. One thing I think could help performance would be to use sendfile to copy data between the sockets. That would avoid some needless user space copying.

    ReplyDelete
  16. Johan, in case of interest, here are the figure of RTS -s for this very simple proxy, after a minute of proxying 35 continus downloads :

    165,454,791,928 bytes allocated in the heap
    37,482,740 bytes copied during GC
    522,428 bytes maximum residency (275 sample(s))
    888,484 bytes maximum slop
    3 MB total memory in use (0 MB lost due to fragmentation)

    Generation 0: 309958 collections, 0 parallel, 2.82s, 5.30s elapsed
    Generation 1: 275 collections, 0 parallel, 0.07s, 0.10s elapsed

    INIT time 0.00s ( 0.00s elapsed)
    MUT time 14.49s ( 50.92s elapsed)
    GC time 2.89s ( 5.41s elapsed)
    EXIT time 0.00s ( 0.00s elapsed)
    Total time 17.38s ( 56.33s elapsed)

    %GC time 16.6% (9.6% elapsed)

    Alloc rate 11,416,253,523 bytes per MUT second

    Productivity 83.4% of total user, 25.7% of total elapsed

    ReplyDelete