COP 4020 Lecture -*- Outline -*-

* Dataflow Parallelism in Haskell
   Most of this material taken from chapter 4 of
    "Parallel and Concurrent Programming in Haskell" by Simon
    Marlow, published by O'Reilly Media, Inc, 2013.
    http://chimera.labs.oreilly.com/books/1230000000929/index.html

> module DataFlow where
> import Control.DeepSeq
> import Data.List (sort)

** motivation

------------------------------------------
 DATAFLOW NETWORKS (PAR MONAD) MOTIVATION

Goals: 
  + more explicit granuality and data dependencies
  + still deterministic

Compared to Eval and Strategies:
  - more overhead, 
     so Par is better for larger granularity

------------------------------------------

** The Par Monad (Section 2.3)

------------------------------------------
    The Par Monad (Control.Monad.Par)

> import Control.Monad.Par

data Par a
runPar :: Par a -> a      -- does computation
fork :: Par () -> Par ()  -- creates a task

instance Monad Par
   
------------------------------------------
       runPar does the computation (in parallel)
          it fires up a new scheduler

*** IVars
------------------------------------------
      COMMUNICATION USING IVar

-- also in Control.Monad.Par

data IVar a  

new :: Par (IVar a)    -- makes a future
put :: NFData a => IVar a -> a -> Par ()
get :: IVar a -> Par a

-- example use:

> testPar n m = 
>   runPar $ do
>            i <- new
>            j <- new
>            fork (put i (fib n))
>            fork (put j (fib m))
>            a <- get i
>            b <- get j
>            return (a+b)
>
> fib n = if n < 2 then n 
>         else (fib (n-2))+(fib (n-1))


------------------------------------------
        Note that IVars are immutable, can only use put once

        ... draw a picture of the dataflow network this creates

                      i    j
                       \  /
                         +

*** examples
**** spawn
------------------------------------------
            SPAWNING A FUTURE

> spawn' :: NFData a => Par a -> Par (IVar a)
> spawn' p = do i <- new
>               fork (do x <- p
>                        put i x)
>               return i

------------------------------------------
        ... spawn' is like the built-in spawn function in Control.Monad.Par
            it creates an IVar, i, then forks a computation that puts
            its value in i, and in parallel it immediately returns i.

**** divide and conqueor algorithms

------------------------------------------
           DIVIDE AND CONQUEOR

pattern of  divide and conqueor algorithms

> divideAndConqueor :: (NFData solution) 
>     => Int
>     -> (problem -> (problem, problem))
>     -> (problem -> solution)
>     -> (solution -> solution -> solution)
>     -> problem
>     -> solution
> divideAndConqueor maxdepth split solve combine prob = 
>     runPar $ solveIt 0 prob
>     where
>       solveIt d prob | d >= maxdepth
>                          = return (solve prob)
>       solveIt d prob = 
>           do let (left, right) = split prob
>              lv <- spawn (solveIt (d+1) left)
>              rv <- spawn (solveIt (d+1) right)
>              ls <- get lv
>              rs <- get rv
>              return (combine ls rs)

------------------------------------------

        This makes a partitioning of the problem down to a certain level
        (up to 2^maxdepth subproblems), then solves them

------------------------------------------
            QUICKSORT USING PAR

This actually does give some speedups

-- using sort from Data.List

> pqsort :: (Ord a, NFData a) => [a] -> [a]
> pqsort xs = 
>     divideAndConqueor 2 psplit sort (++) xs

> psplit :: (Ord a) => [a] -> ([a],[a])
> psplit [] = ([],[])
> psplit (x:xs) = let small = [e | e <- xs, e <= x]
>                     large = [e | e <- xs, e > x]
>                 in (small, x:large)

------------------------------------------

** summary of dataflow parallelism with the Par monad
------------------------------------------
        SUMMARY: EVAL vs. PAR

When to use Eval and Strategies:

 - when you want a lazy result
 - when you don't want to thread a monad
   throughout your code
 - when you need to cleanly separate 
   algorithm from parallelism (Strategies)
 - when the parallelism is fine grained
 - when you need speculative parallelism

When to use dataflow parallelism and Par:

 - when the answer must be fully evaluated
 - when Par can be threaded throughout,
   avoiding multiple calls to runPar
 - when the parallelism is couarse grained

------------------------------------------