Learning the Haskell FFI with C2HS

22 Sep 2013haskell

I’ve recently started helping with the maintenance of C2HS, a tool for generating Haskell foreign function interface (FFI) bindings from C header files. I started using C2HS because of a Haskell library I was writing to read Unidata NetCDF files. I didn’t fancy writing all of the bindings to the hundreds of functions in the C NetCDF library by hand and C2HS seemed like a good way to get started with the Haskell FFI.

The Haskell FFI is well-documented in the Language Report and the Haddock documentation for the Foreign modules that define various helper types and marshalling functions (Foreign and Foreign.C are good starting points). However, even with the documentation, the FFI is pretty complicated, and there are lots of choices for marshalling and for allocating memory for communication between Haskell code and C code: there are Ptrs and ForeignPtrs, there are various functions for allocating memory with different lifetimes, there are lots of types floating around, and it’s quite confusing when you’re getting started.

This is where I found starting with C2HS really useful. It’s very easy to write the C2HS specification for a C function, then you can run the C2HS tool over your code and look at the Haskell marshalling code it produces. For instance, for the C function

double foo(int n, int *status);

where n is an input parameter and status is used as an output parameter, the C2HS specification looks like:

{#fun foo { `Int', alloca- `Int' peekIntConv* } -> `Double' #}

The annotations around the second parameter to foo are used to indicate that some memory needs to be allocated for the status pointer parameter and that the value of this pointer needs to be accessed and returned monadically (the C2HS documentation is pretty good about explaining these things). After running C2HS on this definition, you get some Haskell code that looks like this (after cleaning up a little and renaming some things):

foo :: Int -> IO (Double, Int)
foo n = let n1 = fromIntegral n
        in alloca $ \ptr ->
        foo'_ n1 ptr >>= \res ->
        let resout = realToFrac res
        in peekIntConv pre >>= \ptrout ->
        return (resout, ptrout)

foreign import ccall safe "Eg1.chs.h foo"
  foo'_ :: CInt -> Ptr CInt -> IO CDouble

peekIntConv :: (Storable a, Integral a, Integral b) => Ptr a -> IO b
peekIntConv = liftM fromIntegral . peek

As well as the foreign import of the foo function from its C header file, this has a foo function that deals with all of the type conversion and argument and result marshalling between Haskell and C: conversion between Haskell Int and Double types and CInts and CDoubles for communicating with C functions is done using fromIntegral and realToFrac, allocation of memory and result extraction from the integer pointer status parameter is done, with the sequence of allocation, C function call and result extraction being ordered monadically.

As a more complicated example, consider this function from the NetCDF C library:

int nc_def_var(int ncid, const char *name, nc_type xtype, int ndims,
               const int *dimidsp, int *varidp);

A suitable C2HS specification for this function is:

{#fun nc_def_var { `Int', `String', `Int', `Int',
                   withIntArray* `[Int]',
                   alloca- `Int' peekIntConv* } -> `Int' #}

This function has a C string parameter as well as an integer array passed as an input parameter (which we’re going to represent as a Haskell list) and an integer pointer used as an output parameter. The withIntArray function is a marshalling helper. From this specification, C2HS produces the following Haskell code:

nc_def_var :: Int -> String -> Int -> Int -> [Int] -> IO (Int, Int)
nc_def_var a1 a2 a3 a4 a5 =
  let a1' = fromIntegral a1 in
  withCString a2 $ \a2' ->
  let a3' = fromIntegral a3 in
  let a4' = fromIntegral a4 in
  withIntArray a5 $ \a5' ->
  alloca $ \a6' ->
  nc_def_var'_ a1' a2' a3' a4' a5' a6' >>= \res ->
  let res' = fromIntegral res in
  peekIntConv  a6'>>= \a6'' ->
  return (res', a6'')

foreign import ccall safe "Data/NetCDF/Raw/Base.chs.h nc_def_var"
  nc_def_var'_ :: CInt -> Ptr CChar -> CInt -> CInt -> Ptr CInt -> Ptr CInt -> IO CInt

withIntArray :: (Storable a, Integral a) => [a] -> (Ptr CInt -> IO b) -> IO b
withIntArray = withArray . liftM fromIntegral

This gives a pretty good idea of how to deal with marshalling of input and output arguments of different types, including C strings (it uses the withCString function from Foreign.C.String) and arrays (the withIntArray function is just a specialisation of withArray from Foreign.Marshal.Array).

Once you’ve looked at a few examples like this (and more complex ones), it becomes much easier to figure out how to abstract these patterns for more complex cases. For most of the functions in my NetCDF library, I don’t use C2HS specifications directly, but use parameterised functions I wrote based on experiments done with C2HS. In the beginning, I would definitely have had a hard time writing this sort of parameterised FFI function without the examples provided by C2HS. From that perspective, C2HS is really a pretty neat tool for learning how to use the Haskell FFI (you could probably do just the same kind of thing with hsc2hs or similar tools).