Tuesday, December 19, 2006

F# Life

As I said before, I've been getting very into the idea (if not the practice) of functional programming recently, so I was really excited to read about F#. I'm not the only one, there seems to be a general buzz about this new programming language from Microsoft research in Cambridge. The guy who designed it, Don Syme also played a big part in designing the excellent generics implementation in the CLR and has implemented a number of new CLR features to support F# and functional languages in general. I guess in this respect you can see F# as a test bed for pushing the CLR in some exciting new directions. The great thing about F# is that it allows you to leverage all of the .net framework from a functional language with great Visual Studio integration, including an interactive shell that you can use in any project, not just F# ones. Of course, to use the .net libraries, F# has to include object oriented and imperative programming styles as well and while this has the effect of polluting its pure 'functionalness' it also promise to make it a real swiss army knife of programming techniques. Don Syme has got some great examples of F# utilising the power of .net on his blog.

I'm a total novice at functional programming and as I'm still learning about OO programming even after 10 years, I'm sure I'm just starting off on a very long path of discovery. I really don't get a lot of the higher level concepts I've been reading about but some of the simple stuff does make a lot of sense. In particular I'm very excited about the concept of using functions as first class constructs and some of the cool patterns that this allows.

To illustrate how I'm getting on with F# and to show some of the cool stuff that I've found most interesting, I've implemented an F# version of John Conway's game of life. If you've never come across this before it's was one of the first (if not the first) cellular automations and it's great fun to have a play with one of many on line implementations. I've been using the game of life as a great way of trying out new languages for years. It was both my first java program and even one of my first VB ones (all those years ago). The rules are dead simple: there are a grid of cells of arbitrary size and each cell can be either on or off (alive or dead). If a cell has less than two neighbours it dies from loneliness, if it has more than three, it dies from overcrowding. If a dead cell has exactly 3 neighbours it comes alive. You start off with a pattern of cells and then let the generations roll.

OK, let's have a look at the code. I'm going to paste it in segments with some commentary after each, and you can run each segment in F# interactive or put the whole thing in an F# code file and run it through the F# compiler. First let's define some data structures:

// the size of the grid
let size = 10

// make the grid of 10 X 10 cells initialize each to 0
let theGrid = Array.create_matrix size size 0

// a square array of array with the pattern to run
let thePattern = [|
    [|1;1;1|];
    [|1;0;0|];
    [|0;1;0|]|]

The 'let' statement defines a value (which can also be a function). F# programs seem to be just a collection of let statements as you build up a hierarchy of values defined with other values. It's this declarative style that is one of the hardest things to get your head around if you're an imperative programmer of many years like me. I've defined two arrays of arrays here, one for the initial grid and one for the initial pattern. There's a data structure called a 'list' which is lightweight immutable linked list and more in the functional tradition, but I've used a mutable array here, not because I want the mutability, but because you can address the cells by co-ordinate. The pattern I've set here is the famous 'Glider' that should move across the grid.

// define a function for iterating through a matrix (thanks to DeeJay)
let iteri_matrix f = Array.mapi (fun i -> Array.mapi (f i)) 

This function, iteri_matrix, iterates through my array of array applying a function that takes the co-ordinates of the current cell. I got it from a comment to 'a walk with a newbie', on HubFS which is the community site for F#. That post is a great introduction to a lot of good F# techniques, but I couldn't help thinking that the programming style was too imperative. In the same way that you can write procedural code in OO languages, you can write imperative code in F#. You can do it, but it's not particularly elegant.

The 'iteri_matrix' function illustrates one of the core features of functional programming: functions as first class constructs. iteri_matrix takes a function 'f' as its argument and returns a function. The function it returns consists of a static function Array.mapi that for each item (which is an array, since we want to run it over an array of arrays) applies a second Array.mapi that applies the function 'f' that we originally passed in to each item of the array. When you call iteri_matrix passing it a function argument and a matrix, it constructs a new function that will iterate over the matrix with the given function and then applies that function to the matrix. OK, it's a bit mind boggling, but really powerful when you get the hang of it.

// add the pattern to the middle of the grid
let offset = (size - thePattern.Length)/2
let getPatternCell (i,j) =
    if i < 0 || j < 0 then 0 
    elif i < thePattern.Length && j < thePattern.Length then thePattern.(i).(j)
    else 0
let addPatternToGrid grid = grid |> iteri_matrix(fun i j _ -> grid.(i).(j) + getPatternCell(i-offset, j-offset))

Next we add the initial pattern to the centre of the grid. First we define a function to get a pattern cell at a particular co-ordinate. It returns zero if the co-ordinates are out of range because we want to use it to sum the pattern and the grid without having to make the pattern matrix the same size as the grid matrix. Of note here is the use of a 'traditional' if.. elif ... else statement, later on we'll see another way of mapping conditionals called 'pattern matching' (not to be confused with regular expressions!). 

The second function, 'addPatternToGrid', takes a grid and uses iteri_matrix to sum the grid and the pattern (the offset value puts the pattern in the middle of the grid). A really cool feature of F# here is the '|>' operator which secret geek explains very nicely here. It allows you to chain functions very intuitively a bit like the way the pipe operator '|' works in most unix shell environments.

// define neighbours
let prev i = if i = 0 then size - 1 else i - 1
let next i = if i = size - 1 then 0 else i + 1
let neighbours (i,j) = [
    (prev i, prev j);
    (prev i, j);
    (prev i, next j);
    (i, prev j);
    (i, next j);
    (next i, prev j);
    (next i, j);
    (next i, next j)]

The next step is to define the neighbours of a cell. We're using a feature of F# called 'tuple' to define co-ordinates '(i,j)'. It's a really simple data structure that just groups together some values. The functions 'prev' and 'next' just define the previous and next numbers and 'wrap around' the matrix. The function 'neighbours' takes a co-ordinate and returns a list of the co-ordinates of the neighbours.

// sum a list
let rec sum aList =
    match aList with
    | [] -> 0
    | first::newList -> first + sum newList

This function simply sums a list, but it shows several interesting features. The first is that recursion is the preferred way of doing looping in functional languages. We've defined the function with 'let rec' which tells F# that this function is recursive (there must be a good reason for this, but why can't F# just figure that out for itself?) and we get the function to call itself passing the list minus it's first member. 'match aList with' is a match expression that defines a list of possible pattern matches for aList. The first match matches an empty list and returns zero, the second match strips off the first item from the list and then adds it to the sum of the remaining items.

// get the sum of the live neighbours
let cellAddNeighbours (i,j) grid = neighbours (i,j) |> List.map (fun (a,b) -> grid.(a).(b)) |> sum

// calculate neighbour sums for all cells
let addNeighbours grid = grid |> iteri_matrix (fun i j _ -> cellAddNeighbours (i,j) grid)

Next we work out the sum of all a cell's neighbours with the function 'cellAddNeighbours'. This takes a co-ordinate 'tuple' and a grid as arguments. We pass the cell co-ordinate to the neighbours function, get a list of neighbour co-ordinates back, pass that list to a static function of the List class, List.map, which runs a function on each item and returns a new list of results. Here we're just getting the cell value at the neighbour co-ordinate. The list of cell values is then passed to 'sum' which returns a single value of the sum of all the neighbours. The next function, 'addNeighbours', simply runs 'cellAddNeighbours' for each cell in the grid and returns a new grid of the sums. Once again it uses the iteri_matrix function we defined above.

// live or die rules for a single cell:
//      if the cell is alive and has 2 or 3 neighbours, then it lives, otherwise it dies
//      if the cell is dead and has 3 neighbours it comes alive
let cellLiveOrDie cellValue neighbourSum =
    match (cellValue, neighbourSum) with
    | (1,(2 | 3)) -> 1
    | (0, 3) -> 1
    | (_,_) -> 0

// calculate live or die for the whole grid
let liveOrDie grid neighbourSumGrid = grid |> iteri_matrix (fun i j _ -> cellLiveOrDie grid.(i).(j) neighbourSumGrid.(i).(j))

The next function 'cellLiveOrDie' contains the main set of rules for the game of life. Once again we're using pattern matching, but this time on a tuple. This is a really neat feature because it allows you to just list a set of rules for matching each element in a tuple without writing loads of conditional logic. The first rule says 'if the cellValue is 1 and the neighbourSum is 2 or 3 set the cell value to 1'. The second rule says 'if the cellValue is 0 and the neighbourSum is 3 set the cell value to 1. The third rule says, 'for any other combination, set the cell value to 0. The underscore '_' is a really useful bit of syntax that means 'any value'.

The second function here, 'liveOrDie' takes two matrixes, one of the current grid and one containing the neighbour sums that we calculated earlier. Once again it uses iteri_matrix to apply 'cellLiveOrDie' to each cell in the grid.

// print a cell
let printCell cell = 
    if cell = 1 then printf("X ") else printf("_ ")

// print the grid
let printGrid grid = 
    grid |> Array.iter (fun line -> (line |> Array.iter printCell); printf "\n");
    printf "\n\n"

This function 'printCell' prints a cell to the command line and 'printGrid' uses printCell to print the whole grid. There's a tiny bit of imperative programming in 'printGrid': '(line |> Array.iter printCell); printf "\n"', you can use a semicolon to separate statements just like in C and here we're using it to print a new line after each row of the grid.

// do n generations
let rec DoGenerations n grid =
    printGrid grid;
    match n with
    | 0 -> printf "end\n"
    | _ -> grid |> addNeighbours |> liveOrDie grid |> DoGenerations (n-1)

'DoGenerations' is the core loop of the application. Once again we're favoring recursion over looping in order to do the generations. I really like the clarity of the |> operator. Here we see how nice it is to simply chain together our previously defined functions 'addNeighbours' and 'liveOrDie' to create the next generation.

// run 10 generations
do theGrid |> addPatternToGrid |> DoGenerations 10

Finally, here is the program's 'Main()'. The 'do' statement simply runs the given expression. We take the initial grid, add the pre defined pattern and do 10 generations. The output should be something like this:

_ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 
_ _ _ X X X _ _ _ _ 
_ _ _ X _ _ _ _ _ _ 
_ _ _ _ X _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 


_ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 
_ _ _ _ X _ _ _ _ _ 
_ _ _ X X _ _ _ _ _ 
_ _ _ X _ X _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 

........ missing out a few generations here ......

_ _ X _ _ _ _ _ _ _ 
_ X X _ _ _ _ _ _ _ 
_ X _ X _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 


_ X X _ _ _ _ _ _ _ 
_ X _ X _ _ _ _ _ _ 
_ X _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ 


Monday, December 04, 2006

How to write an XSD

Web services are all about communicating with XML messages. The great benefit of XML is that it is a platform neutral technology. These messages shouldn't have any dependency on a particular web service implementation technology (such as .net or java). Unfortunately many of the implementation toolkits (especially ASP.NET) encourage you to think of web services as Remote Procedure Calls (RPC) which can inject unwanted dependencies on the toolkit and often leads to sub-optimal 'chatty' interfaces. That's why it's always best to define your messages using XSD rather than by getting your implementation toolkit (such as visual studio) to spit out type definitions based on your technology specific types (such as .net classes).

The question then becomes how to write effective XSDs. In this document I'd like to give a few pointers. The example for this demonstration is the following XML document:

<Order 
  xmlns="uri:ace-ina.com:schemas:order" 
  xmlns:prd="uri:ace-ina.com:schemas:product" 
  Id="0">
	<OrderLines>
		<OrderLine Id="0">
			<Product Id="0">
				<prd:Name>Bread</prd:Name>
				<prd:Price>0.79</prd:Price>
			</Product>
			<Quantity>2</Quantity>
			<Total>1.58</Total>
		</OrderLine>
		<OrderLine Id="1">
			<Product Id="2">
				<prd:Name>Milk</prd:Name>
				<prd:Price>0.48</prd:Price>
			</Product>
			<Quantity>1</Quantity>
			<Total>0.48</Total>
		</OrderLine>
	</OrderLines>
	<Total>2.06</Total>
</Order>

It's a simple order with an id and a collection of order lines. Each order line defines a product and gives the quantity and total. The namespace of the order is 'uri:ace-ina.com:schemas:order'. A bit of added complication is introduced by defining the product in a separate namespace: 'uri:ace-ina.com:schemas:product'.

Now let's create an XSD that defines the schema for this XML document. The XSD meta-schema is defined in the namespace: 'http://www.w3.org/2001/XMLSchema', and an XSD's root element is always 'schema', so let's start with that:

<xs:schema 
	xmlns:xs='http://www.w3.org/2001/XMLSchema'>
</xs:schema>

We also want to define the namespace of the target document which in this case is 'uri:ace-ina.com:schemas:order'. We need to include that namespace and reference it in the targetNamespace attribute. To enforce that all the defined elements in the XSD should belong to the target namespace we need to set elementFormDefault to 'qualified'.

<xs:schema 
	xmlns:xs='http://www.w3.org/2001/XMLSchema' 
	xmlns='uri:ace-ina.com:schemas:order' 
	targetNamespace = 'uri:ace-ina.com:schemas:order'
	elementFormDefault='qualified'>
</xs:schema>

Next we should define our types. Think of types in your XSD as entities in the same way as you would think of classes in a .net application or tables in a database. In the order document there are two primary types: 'Order' and 'OrderLine'. 'Product' belongs to a seperate namespace and XSD file and we'll be looking at that later. Types that contain attributes and/or elements are known as 'complex types' and are defined in a 'complexType' element. I like to name complex types '<name of target element>Type'. So let's add two complex types to our XSD, OrderType and OrderLineType:

<xs:schema 
	xmlns:xs='http://www.w3.org/2001/XMLSchema' 
	xmlns='uri:ace-ina.com:schemas:order' 
	targetNamespace = 'uri:ace-ina.com:schemas:order'
	elementFormDefault='qualified'>
	<xs:complexType name="OrderType">
	</xs:complexType>
	<xs:complexType name="OrderLineType">
	</xs:complexType>
</xs:schema>

We can add attributes to our types using the 'attribute' element. Both OrderType and OrderLineType have id attributes which we want to be required integer types:

<xs:schema 
	xmlns:xs='http://www.w3.org/2001/XMLSchema' 
	xmlns='uri:ace-ina.com:schemas:order' 
	targetNamespace = 'uri:ace-ina.com:schemas:order'
	elementFormDefault='qualified'>
	<xs:complexType name="OrderType">
		<xs:attribute name="Id" type="xs:integer" use="required" />
	</xs:complexType>
	<xs:complexType name="OrderLineType">
		<xs:attribute name="Id" type="xs:integer" use="required" />
	</xs:complexType>
</xs:schema>

Child elements can be defined as part of a 'sequence', 'choice' or 'all' containing element. 'Sequence' requires that all its elements exist in the given sequence in the target document, 'choice' allows only one of it's child elements to exist and 'all' requires that all or none of the defined elements exist, but that the order is not important. Repeating elements are not allowed in an 'all' group. minOccurs and maxOccurs are used to define optional and repeating elements. In this case we want to define 'OrderLines' and 'Total' for 'OrderType' and 'Product', 'Quantity' and 'Total' for 'OrderLine'. They are all required non-repeating elements so we don't need to specify minOccurs and maxOccurs (the default for both is '1') and we'll use 'Sequence' for all of them. We need to define the type of each element, both the OrderType and OrderLineType Total are defined as 'double' and Quantity is defined as 'integer'. We'll leave the types of 'OrderLines' and 'Product' until later:

<xs:schema 
	xmlns:xs='http://www.w3.org/2001/XMLSchema' 
	xmlns='uri:ace-ina.com:schemas:order' 
	targetNamespace = 'uri:ace-ina.com:schemas:order'
	elementFormDefault='qualified'>
	<xs:complexType name="OrderType">
		<xs:attribute name="Id" type="xs:integer" use="required" />
		<xs:sequence>
			<xs:element name="OrderLines" type=""/>
			<xs:element name="Total" type="xs:double"/>
		</xs:sequence>
	</xs:complexType>
	<xs:complexType name="OrderLineType">
		<xs:attribute name="Id" type="xs:integer" use="required" />
		<xs:sequence>
			<xs:element name="Product" type=""/>
			<xs:element name="Quantity" type="xs:integer"/>
			<xs:element name="Total" type="xs:double"/>
		</xs:sequence>
	</xs:complexType>
</xs:schema>

Because Total has the same name and type in both OrderType and OrderLineType, we can factor out a global element called Total and reference it from inside OrderType and OrderLineType:

<xs:schema 
	xmlns:xs='http://www.w3.org/2001/XMLSchema' 
	xmlns='uri:ace-ina.com:schemas:order' 
	targetNamespace = 'uri:ace-ina.com:schemas:order'
	elementFormDefault='qualified'>
	<xs:element name="Total" type="xs:double" />
	<xs:complexType name="OrderType">
		<xs:attribute name="Id" type="xs:integer" use="required" />
		<xs:sequence>
			<xs:element name="OrderLines" type=""/>
			<xs:element ref="Total"/>
		</xs:sequence>
	</xs:complexType>
	<xs:complexType name="OrderLineType">
		<xs:attribute name="Id" type="xs:integer" use="required" />
		<xs:sequence>
			<xs:element name="Product" type=""/>
			<xs:element name="Quantity" type="xs:integer"/>
			<xs:element ref="Total"/>
		</xs:sequence>
	</xs:complexType>
</xs:schema>

Now let's consider the OrderLines element in OrderType. In the target document, OrderLines contains a collection of OrderLine types, so we need to create a collection type for OrderLines. We can create a new complex type 'OrderLinesType' with a single repeating element 'OrderLine'. A repeating element is created by setting minOccurs to '0' and maxOccurs to 'unbounded'. We can then set the type of OrderLines to 'OrderLinesType'.

<xs:schema 
	xmlns:xs='http://www.w3.org/2001/XMLSchema' 
	xmlns='uri:ace-ina.com:schemas:order' 
	targetNamespace = 'uri:ace-ina.com:schemas:order'
	elementFormDefault='qualified'>
	<xs:element name="Total" type="xs:double" />
	<xs:complexType name="OrderType">
		<xs:attribute name="Id" type="xs:integer" use="required" />
		<xs:sequence>
			<xs:element name="OrderLines" type="OrderLinesType"/>
			<xs:element ref="Total"/>
		</xs:sequence>
	</xs:complexType>
	<xs:complexType name="OrderLineType">
		<xs:attribute name="Id" type="xs:integer" use="required" />
		<xs:sequence>
			<xs:element name="Product" type=""/>
			<xs:element name="Quantity" type="xs:integer"/>
			<xs:element ref="Total"/>
		</xs:sequence>
	</xs:complexType>
	<xs:complexType name="OrderLinesType">
		<xs:sequence>
			<xs:element name="OrderLine" type="OrderLineType" minOccurs="0" maxOccurs="unbounded"/>
		</xs:sequence>
	</xs:complexType>
</xs:schema>

We're still missing the product type. This is defined in a seperate namespace 'uri:ace-ina.com:schemas:product' in a seperate XSD document:

<xs:schema 
	xmlns:xs="http://www.w3.org/2001/XMLSchema" 
	xmlns="uri:ace-ina.com:schemas:product" 
	targetNamespace="uri:ace-ina.com:schemas:product" 
	elementFormDefault="qualified">
	<xs:complexType name="ProductType">
		<xs:sequence>
			<xs:element name="Name" type="xs:string"/>
			<xs:element name="Price" type="xs:double"/>
		</xs:sequence>
		<xs:attribute name="Id" type="xs:integer" use="required"/>
	</xs:complexType>
</xs:schema>

To reference a schema from another schema with a different namespace we use 'import'. To reference another XSD with the same namespace, use 'include'. Here the namespace is different so we need to add an 'import' element to our Order XSD. We also need to define the product namespace and give it a prefix, since we already have a default namespace (uri:ace-ina.com:schemas:order). We'll use 'prd' here. We can now define the Product element's type as 'prd:ProductType':

<xs:schema 
	xmlns:xs='http://www.w3.org/2001/XMLSchema' 
	xmlns='uri:ace-ina.com:schemas:order' 
	targetNamespace = 'uri:ace-ina.com:schemas:order'
	xmlns:prd='uri:ace-ina.com:schemas:product'
	elementFormDefault='qualified'>
	<xs:import namespace="uri:ace-ina.com:schemas:product" schemaLocation="Product.xsd" />
	<xs:element name="Total" type="xs:double" />
	<xs:complexType name="OrderType">
		<xs:attribute name="Id" type="xs:integer" use="required" />
		<xs:sequence>
			<xs:element name="OrderLines" type="OrderLinesType"/>
			<xs:element ref="Total"/>
		</xs:sequence>
	</xs:complexType>
	<xs:complexType name="OrderLineType">
		<xs:attribute name="Id" type="xs:integer" use="required" />
		<xs:sequence>
			<xs:element name="Product" type="prd:ProductType"/>
			<xs:element name="Quantity" type="xs:integer"/>
			<xs:element ref="Total"/>
		</xs:sequence>
	</xs:complexType>
	<xs:complexType name="OrderLinesType">
		<xs:sequence>
			<xs:element name="OrderLine" type="OrderLineType" minOccurs="0" maxOccurs="unbounded"/>
		</xs:sequence>
	</xs:complexType>
</xs:schema>

The last remaining task is to define our top level global element 'Order' with type 'OrderType':

<xs:schema 
	xmlns:xs='http://www.w3.org/2001/XMLSchema' 
	xmlns='uri:ace-ina.com:schemas:order' 
	targetNamespace = 'uri:ace-ina.com:schemas:order'
	xmlns:prd='uri:ace-ina.com:schemas:product'
	elementFormDefault='qualified'>
	<xs:import namespace="uri:ace-ina.com:schemas:product" schemaLocation="Product.xsd" />
	<xs:element name="Order" type="OrderType" />
	<xs:element name="Total" type="xs:double" />
	<xs:complexType name="OrderType">
		<xs:attribute name="Id" type="xs:integer" use="required" />
		<xs:sequence>
			<xs:element name="OrderLines" type="OrderLinesType"/>
			<xs:element ref="Total"/>
		</xs:sequence>
	</xs:complexType>
	<xs:complexType name="OrderLineType">
		<xs:attribute name="Id" type="xs:integer" use="required" />
		<xs:sequence>
			<xs:element name="Product" type="prd:ProductType"/>
			<xs:element name="Quantity" type="xs:integer"/>
			<xs:element ref="Total"/>
		</xs:sequence>
	</xs:complexType>
	<xs:complexType name="OrderLinesType">
		<xs:sequence>
			<xs:element name="OrderLine" type="OrderLineType" minOccurs="0" maxOccurs="unbounded"/>
		</xs:sequence>
	</xs:complexType>
</xs:schema>

Defining your web service message types in terms of XSD decouples your web service from a particular implementation technology and aids interoperability. Also, understanding the XSD syntax allows you to read and understand WSDL, create your own client proxies and control the serialization between your implementation types and the XSD.

For a much more complete and extensive discussion on writing XSD schemas see the world wide web consortium's XML Schema Part 0: Primer Second Edition