Tuesday, May 16, 2006

Qi Macros and Type Patterns

Taking a step back to the Magic Prime Type, we find that the pattern that we used to generate the prime type may be useful to encapsulate in some manner. The pattern goes as follows: If you have a base class like number, and you have a derived class such as prime, then you can create these "prime like" classes as long as you have a function that returns true for the basic and derived type. Simply put, if we could just swap different lexical elements into our prime datatype constructor, we could have a useful piece of code for creating exotic types such as the “even” type or perhaps the “catalan” type. Obviously, when we talk about placing lexical elements inside of code, we are talking macros!

So what would the Lisp-like macro look like for our “subtype-with-test” function? Something like the following, only better.

(DEFMACRO subtype-with-test ( BaseType NewType BaseTest NewTest)
`(datatype ,NewType
if(and ( ,BaseTest X ) ( ,NewTest X ) )
____
X : ,NewType ;

___
( ,NewTest X) : verified >> X : ,NewType ;

( subtype B A ) ; X : B ;
___
X : A;

___
(subtype ,NewType ,BaseType);
)
)

However, if you try to input that into Qi, you’ll find it will not make a compiling as something seems to happen to the ;. We need to use the "macroexpand" function mentioned in Appendix B of FPIQ language book. Fortunately for us, the syntax makes our final Qi macro a bit easier to read than the lisp version. We first create a pattern that looks like a function call that matches our macro. We’ll name it the same as above and it will take the same arguments.

(define macroexpand
[subtype-with-test BaseType NewType BaseTest NewTest] ->
[datatype NewType
if[and [ BaseTest X ]
[ NewTest X ]]
______
X : NewType ;

____
[ NewTest X ] : verified >> X : NewType ;

[ subtype B A ] ; X : B ;
___
X : A;

___
[ subtype NewType BaseType ]; ]
X -> X
)

And that’s all there is to it! Just remember to make sure to include the X-> X pattern at the bottom. Every single token receives a macroexpand so we need to ensure that it works as intended for all but our special "subtype-with-test" macro. Note that the pattern match serves as our macro definition. We define it in no other place.

Now doing the following creates a brand new type!

(subtype-with-test number prime integer? prime?)

We should note that the macroexpanded "subtype-with-test" prevents the evaluation of the 4 inputs. So we find the typical problem with variable capture. We don’t have the ability to name our classes X, A, or B!!! *laugh* Not a big loss, but we can modify the code and learn to deal with variable capture. We do this in the same way we deal with it in Lisp, namely gensyms.

define macroexpand
[subtype-with-test BaseType NewType BaseTest NewTest] ->
(let X (gensym "Temp")
(let A (gensym "Temp")
(let B (gensym "Temp")
[datatype NewType
if[and [ BaseTest X ]
[ NewTest X ]]
______
X : NewType ;

____
[ NewTest X ] : verified >> X : NewType ;

[ subtype B A ] ; X : B ;
___
X : A;

___
[ subtype NewType BaseType ]; ] )))

X -> X
)

One important point to note is that the "Temp" (or whatevever you want to name your gensym) must start in UPPER CASE! Otherwise, Qi won’t be able to tell it is a variable!

Qi does warn about these sorts of issues however, if you see the free variable warning in a macro, capture them with gensyms to turn this sort of output …

(0-) (load "test/pmacro2.qi")
======> Warning:
the following variables are free in macroexpand: B; X; A;


WARNING:
DEFUN/DEFMACRO: redefining function macroexpand in D:\tools\Qi 6.2\startup_qi.tx
t, was defined in D:\tools\Qi 6.2\Install Qi\Qi 6.3.txtmacroexpand

Real time: 0.0156303 sec.
Run time: 0.015625 sec.
Space: 185740 Bytes
loaded


into this much nicer output…

(0-) (load "test/pmacro3.qi")

WARNING:
DEFUN/DEFMACRO: redefining function macroexpand in D:\tools\Qi 6.2\startup_qi.tx
t, was defined in D:\tools\Qi 6.2\Install Qi\Qi 6.3.txtmacroexpand

Real time: 0.0312606 sec.
Run time: 0.03125 sec.
Space: 194036 Bytes
Loaded

Note the lack of compiler warnings about free variables. However it will always warn you about redefining macroexpand. Of course, since we are defining VARIABLES in the datatype, no real variable capture can happen. Our macro requires lower case symbols to even operate. But this example shows how to avoid the issue in the odd case where you want to make a macro that you can define different variable names inside it. (Can anyone think of a good reason to do this?)

So let us try this new macro out! We’ll create an even test and create a new type called "even" that can only type to actual even numbers.


(12+) (define even?
{ number --> boolean }
X -> (integer? (/ X 2)) )
even? : (number --> boolean)

(13+) (subtype-with-test number even integer? even?)
even : unit

(14+) 3 : even
error: type error

(15+) 4 : even
4 : even

(16+) 100 : even
100 : even

(17+) 101 : even
error: type error


I’d say that is an impressive job for one line of code! Let me know what you think!

Monday, May 15, 2006

Treaps in Qi

Delving back into the type system for this post, I thought I would take on the strange and wonderful Binary Search Tree known as the treap. For those that don’t know, a treap is a binary search tree that has a “priority” element attached to the key. It uses the random priority to ensure balancing of the tree. It does this by ensuring the tree is in binary search tree form with respect to the keys, and in heap form with respect to the priorities. We will create a simple type of string based treap in Qi and perhaps extend it to a more general version if I find the time to finish it. *grin*

The datatype for treap can be either list based or tuple based. I’ll begin with the tuple version as it shows off a bit more of the power of Qi pattern matching, but a list version would be more readable. We can signify empty leaves with the null list in either case.

_______
[] : treap;

And now we will create the recursive type. It takes a key and a priority and has a left branch and a right branch. Using the tuple to define the leaves means we place the key and priority in their own tuple and use that as the first element. Then we can put the left and right branches in their own tuple as the second element of our treap node. We’ll use the ever handy left/right sequent operator to ensure we can prove things about treaps using the datastructure only and typing the internal parts.

Key : string; Priority : number; LeftTreap : treap; RightTreap : treap;
===============================
(@p (@p Key Priority) (@p LeftTreap RightTreap)) : treap;


Now when we wish to insert a key into the treap, we must first make a random number to assign it for heap priority.

(define treap_insert
{string --> treap --> treap}
K Tree -> (treap_insert* K (- (random 40000000) 20000000) Tree))


Now the fun begins. The base case seems simple enough, just make a treap that looks like the above definition but has LeftTreap and RightTreap set to [].

(define treap_insert*
{string --> number --> treap --> treap}
K P [] -> (@p (@p K P) (@p [] []))

Now it could be that we are trying to insert an element that we already have. If so, we simply return the treap we had to begin with.


K P (@p (@p K* P*)
(@p L R))
-> (@p (@p K* P*) (@p L R)) where (= K K*)


Note that our pattern matched definition is well typed. L is a treap and R is a treap. K is a string and P is a number. We match the new key and priority as K and P. We match the treap node’s priority as K* and P* and the branches as L and R. If you mess up using them in your definition, the complier complains loudly.

Now we know the K is not equal to the current treap node, if it is less than the tree node we insert it somewhere on the left tree. AFTER we do this, then we order the treap by heap priority, preserving the BST ordering as we move our way up the tree. We’ll use the same pattern as the element above.

K P (@p (@p K* P*)
(@p L R))
-> (let Tree (@p (@p K* P*)
(@p (treap_insert* K P L)
R))
(if (> P* P)
(rotate_right Tree)
Tree))

where(str-lt K K*)

So if our new priority is less than the old one, we move it up the treap. That makes it reverse heap ordering, but we are not splitting hairs here. *grin*

So now we need to find out how to rotate a treap node up a treap. Rotate right takes the L branch and finds its left and right treaps and their keys. It then puts the left L treap as the current left, uses the left treap node’s key/priority as the current key/priority and makes a new right treap with the original left nodes right hand treap paired with the old right hand side, placing the old key/priority there. The code below serves as our heapify.

(define rotate_right
{treap --> treap}
(@p (@p Key Priority)
(@p (@p (@p K* P*)
(@p A1 A2))
B))

-> (@p (@p K* P*)
(@p A1
(@p (@p Key Priority)
(@p A2 B)) ))

)

Pattern matching makes the change short and sweet. Type checking ensures that we have a valid treap at the end.

Now we can finish our definition of treap_insert with the right hand insert

K P (@p (@p K* P*)
(@p L R))

-> (let Tree (@p (@p K* P*)
(@p L
(treap_insert* K P R)))
(if (> P* P)
(rotate_left Tree)
Tree))


We can make a rotate_left along the lines of the rotate_right

(define rotate_left
{treap --> treap}
(@p (@p Key Priority)
(@p A
(@p (@p K* P*)
(@p B1 B2)))) -> (@p (@p K* P*)
(@p
(@p (@p Key Priority)
(@p A B1))
B2))
)

All we need to do now is define a string less than function str-lt that compares strings. And here we run into a problem. We have no type safe way of comparing strings in Qi. So we’ll have to modify Qi to know about our version of the string compare function. We’ll build it from the ground up by giving CHAR-CODE a type. In deference to Haskell, let us name the function “ord”.

You can patch it with

42c42
<> occurrences occurs-check or output prf print profile profile-results prooft
ool provable?
47c47
<> track tuple? undebug unprf union unprofile untrack value version
1527c1527
<> number? 1 occurs-check 1 occurrences 2 or 2 prf 1 print 1 profile 1 profil
e-results 1 prooftool 1
1586,1587c1586
< (boolean --> (boolean --> boolean)) ord (character --> number)
<> B) --> (A --> B))
---
> (boolean --> (boolean --> boolean)) prf ((A --> B) --> (A --> B))
2397,2398d2395
< (DEFUN ord (x) (CHAR-CODE x)) <> string --> boolean}

But it would help to simply search for the word “difference” as it is an internal function used only 3 times in the program outside its own definition and add the code yourself to learn how. There you find the arity list that you must add the ord function too and the type list where you specify the type. Then you can add the DEFUN for ord by the other system function defuns. Run the installer and you will get a version of Qi that has our “ord” function.


--- EDIT ---

I'm adding a much better way of doing this in Qi that Mark pointed out on the mailing list.

Lisp contains a function STRING<, so if you want to bring this into Qi,
then the following will do it.

(0-) (define string<
X Y -> (if (empty? (STRING< X Y)) false true))
======> Warning:
the following variables are free in string<: STRING<;
string<

(1-) (newfuntype string< (string --> string --> boolean))
string<

(2-) (tc +)
true

(3+) string<
string< : (string --> (string --> boolean))

(4+) (string< "ghgjj" "ghhhg")
true : boolean

--- EDIT END ---


Now we can define our string less than function, “str-lt”.

(define str-lt
{string --> string --> boolean}
S1 S2 -> (str-lt* (explode S1) (explode S2)))

(define str-lt*
{(list character) --> (list character) --> boolean}
[] _ -> false
_ [] -> true
[X |_] [Y | _] -> true where (< (ord X) (ord Y))
[_ |L1] [_ | L2] -> (str-lt* L1 L2)
)


And now we have a working, input-only treap. I think at this point I will take a break and perhaps return the program efficiency and changing of the syntax in another article. Let me know of any questions or comments you have so far please!

Here is the full file. I'll create a place to download my code sometime soon.


(datatype treap

_______
[] : treap;


Key : string; Priority : number; LeftTreap : treap; RightTreap : treap;
===============================
(@p (@p Key Priority) (@p LeftTreap RightTreap)) : treap;
)


(define treap_empty?
{ treap --> boolean }
Tree -> true where (= Tree [])
Tree -> false
)

(define rotate_left
{treap --> treap}
(@p (@p Key Priority)
(@p A
(@p (@p K* P*)
(@p B1 B2)))) -> (@p (@p K* P*)
(@p
(@p (@p Key Priority)
(@p A B1))
B2))
)

(define rotate_right
{treap --> treap}
(@p (@p Key Priority)
(@p (@p (@p K* P*)
(@p A1 A2))
B))

-> (@p (@p K* P*)
(@p A1
(@p (@p Key Priority)
(@p A2 B))
))
)

(define treap_insert
{string --> treap --> treap}
K T -> (treap_insert* K (- (random 40000000) 20000000) T))

(define treap_insert*
{string --> number --> treap --> treap}
K P [] -> (@p (@p K P) (@p [] []))

K P (@p (@p K* P*)
(@p L R))
-> (@p (@p K* P*) (@p L R)) where (= K K*)

K P (@p (@p K* P*)
(@p L R))
-> (let Tree (@p (@p K* P*)
(@p (treap_insert* K P L)
R))
(if (> P* P)
(rotate_right Tree)
Tree))

where(str-lt K K*)

K P (@p (@p K* P*)
(@p L R))

-> (let Tree (@p (@p K* P*)
(@p L
(treap_insert* K P R)))
(if (> P* P)
(rotate_left Tree)
Tree))
)


(define str-lt
{string --> string --> boolean}
S1 S2 -> (str-lt* (explode S1) (explode S2)))

(define str-lt*
{(list character) --> (list character) --> boolean}
[] _ -> false
_ [] -> true
[X |_] [Y | _] -> true where (< (ord X) (ord Y))
[_ |L1] [_ | L2] -> (str-lt* L1 L2)
)

Monday, May 08, 2006

Qi Optimization and Catalan Fun!

We start this week by delving a bit into number theory and seeing what we can accomplish with Qi. We will start with a simple approach to the problem and then perform some optimizations on our program. We will use pattern matching, backtracking, and arrays to enhance the performance of our sequence in order to better understand Qi. Hopefully, this process will show the powers of Qi to search complicated number spaces and what limitations we run into when using Qi in general as well.

Let us start with the empty string, "". We can define this as 0. If we put a pair of parenthesis around the empty string, we get the new string "()". We can call this string 1. Putting a pair of parenthesis around the 1 gives us "(())" or what we will call 2. Alternatively, we can concatenate 1 to 1 to get another version of the string, "()()". Doing this for 3 gives us "((()))" , "()()()" , "(()())", "(())()" and "()(())". These obviously represent basic addition facts if you think of concatenation as + and the parenthesis as + 1. Or we can think of these strings as the number of ways we can parenthesize a string of length N. We have defined these strings via 2 basic operations, either by interning previous strings with parenthesis or by either concatenation be it left-handed or right-handed. This feels reminiscent of the S and K combinators, but perhaps that’s a topic for another time.

So from that description, we can devise a simple Qi program to output those strings. We will call this function get-cstrings. The base case is 0 and we define it as the empty string.

(define get-cstrings
0 -> [""]

Now for the recursive case, we either perform the "interning" operation by putting a pair of parentheses around the previously created number, or by taking any previous number combination that adds to our number and concatenating them together. We define the intern function as intern-all which takes a list of strings and interns each string. We also define the left and right concatenation function as lr-apply. It takes in the interned strings and adds the concatenation strings to it. This gives us the full function definition as follows.

(define get-cstrings
0 -> [""]
N -> (let NewList (intern-all (get-cstrings (- N 1) ))
(lr-apply N NewList))
)

Turing to intern all, we need to take a list and add the parentheses around each element. So we use map to apply a function across the list. The function should take a string, and add parentheses around the front and back. Using this opportunity to apply the Lisp underbelly of Qi seems fun at this point, and gives us more to optimize later. So we take the string, turn it into a list, append the parentheses, and then turn it back into a string.

(define intern-all
Xs ->
(map
(/. Elem
(COERCE (append [ #\( | (COERCE Elem LIST) ] [ #\) ] ) STRING))
Xs
)
)

Turing our attention to lr-apply now, we note that it walks from 1 up to n-1, forming new strings from pairs of older strings by concatenating them to each other. So we will start with lr-apply

(define lr-apply
0 Xs -> Xs
N Xs -> (lr-apply* 1 (- N 1) Xs))

And now we need to define the recursive lr-apply* that creates these strings. It takes in the left number and the right number, along with the strings we have built so far. It adds the new strings to the final list. When the right number becomes greater than the left number, we stop. Using the helper function lr-work, we try the following definition.

(define lr-apply*
L R Final ->
(let A (get-cstrings L)
(let B (get-cstrings R)
(lr-apply* (+ L 1) (- R 1) (lr-work A B Final))))
where (>= R L)
_ _ Final -> Final
)

The lr-work function takes the left list of strings and the right list of strings and combines them and adds them to the final list. It uses a helper function pair-this to concatenate an element to each element of the second list. If we had access to the Haskell prelude, we could define this more concisely. You can check the google group for a beginning implementation.

(define lr-work
[] _ Final -> Final
[X | Xs] Ys Final -> (lr-work Xs Ys (pair-this X Ys Final))
)

(define pair-this
_ [] Final -> Final
X [Y|Xs] Final -> (pair-this X Xs (make-ap-pair X Y Final))
)

Finally, we have a function called make-ap-pair that takes 2 strings and returns their left and right concatenation. It uses a helper function named give-unique to ensure that the element has not already been placed in the list. We’ll use the Lisp function concatenate instead of any Qi functions.

(define make-ap-pair
L-num R-num Xs -> (let AfterLeft (give-unique [(CONCATENATE STRING L-num R-num)] Xs)
(give-unique [(CONCATENATE STRING R-num L-num)] AfterLeft))
)

(define give-unique
[] Xs -> Xs
[X | Left] Xs -> (give-unique Left Xs) where (element? X Xs)
[X | Left ] Xs -> (give-unique Left [ X | Xs ])
)

Running this gives:

(2-) (get-cstrings 0)
[""]

(3-) (get-cstrings 1)
["()"]

(4-) (get-cstrings 2)
["()()" "(())"]

(5-) (get-cstrings 3)
["(())()" "()(())" "()()()" "(()())" "((()))"]

This function obviously grows the list very quickly. Look at the following.

(6-) (LENGTH (get-cstrings 3))
5

(7-) (LENGTH (get-cstrings 4))
14

(8-) (LENGTH (get-cstrings 5))
42

(9-) (LENGTH (get-cstrings 6))
132

(10-) (LENGTH (get-cstrings 7))
429

(11-) (LENGTH (get-cstrings 8))
1430

(12-) (LENGTH (get-cstrings 9))
4862

And c-strings 9 starts to take some noticeable time on my personal box.

(13-) (time (LENGTH (get-cstrings 9)))

Real time: 4.984375 sec.
Run time: 4.953125 sec.
Space: 3335576 Bytes
GC: 6, GC time: 0.0 sec.4862

(14-) (time (LENGTH (get-cstrings 10)))

Real time: 63.984375 sec.
Run time: 63.203125 sec.
Space: 12635520 Bytes
GC: 19, GC time: 0.0625 sec.16796

Obviously, we want to do better. So let us note that the function calls the previous function many times. If we memorize the array, we can find the previous runs with no difficulty. We will use a global array to hold the strings up to 10 named *clst*. We use backtracking to check to see if we already have memorized the result, if not, we compute it.

(define get-pstrings
0 -> [""]
N <- (get-array (value *clst*) [N] #\Escape)
N -> (append (intern-all (get-pstrings (- N 1)) []) (lr-apply N []) ) where (> N 10)
N -> (let NewList (intern-all (get-pstrings (- N 1)) [])
(let LrList (lr-apply N [])
(put-array (value *clst*) [N] (append NewList LrList) )))
)

We can change the intern-all to make more sense and run much faster. Map ends up being slower than simply defining it recursively.

(define intern-all
[] Out -> Out
[X | Xs] Out -> (intern-all Xs [(CONCATENATE STRING "(" X ")") | Out] )
)

These simple changes gives us twice the performance.

(10-) (time (LENGTH (get-pstrings 9)))

Real time: 2.53125 sec.
Run time: 2.515625 sec.
Space: 730696 Bytes
GC: 2, GC time: 0.015625 sec.4862

(11-) (time (LENGTH (get-pstrings 10)))

Real time: 33.609375 sec.
Run time: 33.265625 sec.
Space: 2969960 Bytes
GC: 4, GC time: 0.03125 sec.16796


Many more obvious improvements can be made, but that suffices to show how you can flesh out a correct answer then modify it for efficiency.

Of course, if you took this sequence of integers and search for them you’ll find that this is a well known sequence called A000108, Segner numbers, or more often Catalan numbers.
Our favorite number list
has the following interesting definition.

a(n) = binomial(2n, n)-binomial(2n, n-1)

We can easily understand how the binomial(2n,n) is part of the result, as we have strings of N left parentheses and N right parentheses. We must choose equal pairs of parentheses so we have N as the bottom element of the function. However, we must remove the unbalanced parenthesis so we take out the incorrect elements via subtracting the number of strings that are "unbalanced" or one off of balanced. Hence we find the binomial (2n, n-1).

However, if we think about this problem as we have programmed it above, we can come up with an equivalent definition that may help us understand the binomial function. In the program above, it takes the a(n-1) and interns each element, returning the same number. It then builds lists from combinations of previous results. So we have the equation

a(n) = a(n-1) + some_binomial_function

So let us determine what this binomial function turns out to be. Remembering Pascal’s rule of binomial(n-1,k)+binomial(n-1,k-1)=binomial(n,k) allows us to reduce the original definition into the following, with b for binomial.

a(n) = b(2(n-1),n-1) + b(2(n-1),n-2) + b(2(n-1),n-1) + b(2(n-1),n)
–b(2(n-1),n-3) - b(2(n-1),n-2)- b(2(n-1),n-2) - b(2(n-1),n-1)

And remembering the b(n,r)=b(n,n-r) gives us the following reduction.

a(n) = binomial(2(n-1), n-1) – binomial(2(n-1),n-3)

So now we found part of an a(n-1), thus we can find our some_binomial_function from

a(n) = binomial(2(n-1), n-1) –binomial(2(n-1),n-2) + some_binomial_function

some_binomial_functions = binomial(2(n-1),n-2) – binomial(2(n-1),n-3)

So we can see that the first part of the binomial function cancels out with the last part of the a(n-1) function, leaving only the final part as a subtraction. Thinking about what this "some_binomial_function" means, we can see that we take the substrings that we can add together to get a bigger substring, minus the number of invalid and duplicate substrings. This gives us the final version that computes our Qi functions output length.

a(n) = a(n-1) + binomial(2(n-1),n-2) – binomial(2(n-1),n-3)

Of course, we can make more optimizations and go further into the Catalan numbers, which I may do in the future. For now, I hope this serves as a fun introduction to Qi optimization and the always interesting Catalan sequence. Let me know what you think!