Documentation
Not logged in

Copyright (C) 2009-2010, Trevor Davel <twylite AT crypt DOT co DOT za>
See the file "LICENSE.txt" (Tcl/Tk License) for information on usage and redistribution of this file, and for a DISCLAIMER OF ALL WARRANTIES.


Navigation: control::functional manual | TIP: Higher-order functions in Tcl.

The design of control::functional (strawman)

Start at the beginning: what is a 'function' in Tcl?

In reviewing the approaches to functional programming and/or higher-order functions in Tcl it struck me that while the concept of 'a function' is well defined in FP, many of the suggestions of implementations of "higher-order functions" had only a loose idea of what 'a function' may be. Some implementations only support an expression, for example. Others will work with lambdas but not with named functions. Others require the caller to pay careful attention to quoting in order to execute the right thing.

In Tcl there are many things that look like a function: they take one or more arguments and return a value. For example:

If we want to support higher-order functions, I believe it is necessary to have a consistent model of what 'a function' is, and how to invoke 'a function' with arguments.

Ralph Emerson said

"A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines."

My corollary is

"A foolish inconsistency is the bane of engineers."

Requirements & ideals

Given something that is passed to or returned from a higher-order operator and which purports to be 'a function' we should be able to:

  1. Invoke the function, with arguments, and obtain a normal Tcl result (i.e. a return value or an exception code).
  2. Construct a new function through currying or partial application of the function, and to compose functions.
  3. Pass the 'function' to another higher-order function that has these same expectations.
  4. Return the function.

It is highly desirable that all 'function-like things' get invoked in the same way. If this is not the case then either (i) every higher-order function must implement logic to determine the type of function and know how to implement it; or (ii) only a subset of Tcl's 'function-like things' may be used with higher-order functions.

In the perfect world we would be able to hand such a function to existing Tcl and Tk commands that take callbacks, for example interp alias, interp bgerror, chan event, lsort, socket, and trace. Callbacks are often the first place that developers new to Tcl encounter the complexity of quoting, dynamically rewriting scripts, and evaluating in the wrong namespace. An ideal definition of 'function' would help us to reduce this complexity (for example, lambda could be a drop-in replacement for namespace code {...}). At the very least we should try to avoid creating a system with similar complexities by recognising that developers will have expectations on the context in which a lambda will execute.


Evolving functional programming in Tcl

Heavily guided by existing attempts at higher-order functions and lambda in Tcl, let us start with the notion of a Tcl function that may receive a function as an argument:

  proc map {func list} {
    # for each item in list, call func with item as the parameter, and append
    # the result to an output list
  }

We can pass a named function to map by name:

  proc plusone {v} {
    expr { $v + 1 }
  }
  
  map plusone {1 2 3 4 5}
  # returns 2 3 4 5 6

Leading to the following implementation of map:

  proc map {func list} {
    set outlist {}
    foreach item $list {
      lappend outlist [$func $item]
    }
    return $outlist
  }

Tcl also supports anonymous (unnamed) functions (since 8.5). These functions must be called using apply, which requires a subtly different implementation of map:

  proc map {anonfunc list} {
    set outlist {}
    foreach item $list {
      lappend outlist [apply $anonfunc $item]
    }
    return $outlist
  }

  map {{v} { expr { $v + 1 } }} {1 2 3 4 5}

This difference in handling between named and anonymous functions is undesirable. Any higher-order function would need to know whether it was passed a named or anonymous function, and choose the appropriate logic to invoke the function.

Some suggestions on the Tclers Wiki treat the function as a list and use a heuristic based on the length of the list to decide how to invoke the function. A short excursion down this path shows problems handling ensembles and objects, and possible difficult with currying.

A better solution may be to make named and anonymous functions look the same when passed to a higher-order function, by which we mean "be invoked in the same way".

An obvious answer (since Tcl 8.5) is to prefix the anonymous function with apply, so that eval will behave correctly for both named and anonymous cases:

  proc map {func list} {
    set outlist {}
    foreach item $list {
      lappend outlist [eval $func [list $item]]
    }
    return $outlist
  }

  proc plusone {v} {
    expr { $v + 1 }
  }

  map plusone {1 2 3 4 5}
  map {apply {{v} { expr { $v + 1 } }}} {1 2 3 4 5}

Notice the quoting of the argument to eval in the implementation of map: the func must be treated as a list such that the effective command becomes apply in the case of an anonymous function, while the arguments to the function must be protected against list expension.

As a consequence of this quoting, the actual command executed by the higher-order function is not $func but [lindex $func 0]. For most named functions [map $name $list] will work as expected because for simple strings not containing whitespace [lindex $func 0] eq $func. When $name is itself a list or can be interpreted as a list we will need to quote $name correctly, i.e. [map [list $name] $list].

In effect map no longer takes a function as an argument, but a list that is evaulated as a Tcl script. Arguments to the function are appended to the list (one element per argument) before evaulation. To reflect this change we will refer to the function argument as a command prefix.

It is also worth noting that eval will cause the command prefix to be evaluated in the context of map, while the programmer probably expects it to be evaluated in the calling context. This can be fixed easily using uplevel.

  proc map {cmdprefix list} {
    set outlist {}
    foreach item $list {
      lappend outlist [uplevel 1 $cmdprefix [list $item]]
    }
    return $outlist
  }

What uplevel is doing here is allowing commands and variables to be resolved in the caller's context. This is of questionable value if cmdprefix is a named proc or a lambda, but if cmdprefix is a script then it will be invoked in an unsurprising context. This will also allow us to abuse higher-order functions in interesting ways, for example:

  # Python's enumerate
  # enumerate {a b c d e} -> {1 a} {2 b} {3 c} {4 d} {5 e}
  proc enumerate {list} {
    map {concat [incr x]} $list
  }

While we have simplified the implementation of map so that it doesn't need to distinguish between named and anonymous functions, using an anonymous function with map has become uglier:

  map {apply {{v} { expr { $v + 1 } }}} {1 2 3 4 5}

We can improve the visual appearance by creating helpers to build anonymous procs and functions:

  proc lambda {arglist body} {
    # We're ignoring the namespace, for now
    list apply [list $arglist $body]
  }
  
  proc lambdaexpr {arglist exprbody} {
    list apply [list $arglist [list expr $exprbody]]
  }
  
  map [lambdaexpr {v} { $v + 1 }] {1 2 3 4 5}

So far so good, but there is another way to represent the "plusone" operation: using infix operators.

  map [list ::tcl::mathop::+ 1] {1 2 3 4 5}

Here we are using a named proc (::tcl::mathop::+) and an argument as the command prefix. The command prefix is no longer just a function, but a function with arguments! This is a highly desirable side-effect of our use of command prefixes, and is known as partial application or (in a more general form described below) currying.

What happens when we try to currying anonymous functions?

  # A straightforward call
  set anonfunc [lambdaexpr {v} { $v + 1 }]
  map $anonfunc {1 2 3 4 5}
  
  # Currying
  set anonfunc [lambdaexpr {n v} { $v + $n }]
  map [list $anonfunc 1] {1 2 3 4 5}
  # gives error 'invalid command name "apply {{n v} {expr { $v + $n }}}"'

That didn't work as we may have expected. Returning to the issue of quoting helps us understand why: the command that is executed is [lindex $cmdprefix 0], which in this case is $anonfunc, and in the named argument case is "::tcl::mathop::+". To make currying work we want to take $anonfunc (a list) and lappend the arguments. This will also work for named functions, and even for named functions where the name can be interpreted as a list.

  map [linsert ::tcl::mathop::+ end 1] {1 2 3 4 5}
  map [linsert $anonfunc end 1] {1 2 3 4 5}

A helper will make this easier. curry will take a function and an argument, and return a function that is the partial application of the argument to the input function:

  proc curry {cmdprefix arg} {
    lappend cmdprefix $arg
  }      

  set anonfunc [lambdaexpr {n v} { $v + $n }]
  map [curry $anonfunc 1] {1 2 3 4 5}
  
  map [curry ::tcl::mathop::+ 1] {1 2 3 4 5}

We can curry named or anonymous functions to create new (anonymous) functions. If we've got this right, we should even be able to curry a curry.

  set cfunc [curry concat alpha]
  map [curry $cfunc beta] {1 2 3 4 5}

A practical limitation of this definition of curry is that we can only partially apply the left-most argument(s) to create the curried function.

Cross-check

A strawman has been evolved for lambda, curry and map, based on a command prefix as a 'function'.

It is worth cross-checking this model against our design goals, existing implementations, and past failures.

We will start by putting map through its paces with various 'function-like things':

  # Our strawman definitions (lambda now supports namespaces)
  
    proc map {cmdprefix list} {
      set outlist {}
      foreach item $list {
        lappend outlist [uplevel 1 $cmdprefix [list $item]]
      }
      return $outlist
    }
  
    proc lambda {arglist body} {
      # Upgraded to support namespace
      set ns [uplevel 1 namespace current]
      list apply [list $arglist $body $ns]
    }
  
    proc curry {cmdprefix arg} {
      lappend cmdprefix $arg
    }      


  # Named proc - we will use ::tcl::mathop::+
    map [curry ::tcl::mathop::+ 1] {1 2 3 4 5}
    # strictly this should be [curry [list ::tcl::mathop::+1] 1], as the quoting
    # will behave correctly in all cases
    set func [list ::tcl::mathop::+]
    map [curry $func 1] {1 2 3 4 5}

  # Named proc in namespace with spaces in name
    namespace eval test {
      proc "nasty mul" {m v} {
        expr { $m * $v }
      }
    }
    namespace eval test {
      map [curry "nasty mul" 1] {1 2 3 4 5} 
        # FAILS (expected) - what we passed to [curry] isn't a function as we have defined it
    }
    namespace eval test {
      map [curry [list "nasty mul"] 1] {1 2 3 4 5}
    }

  # Anonymous proc
    map [lambda {x} { expr { $x + 1 } }] {1 2 3 4 5}
    
  # Anonymous proc in a namespace
    namespace eval test {
      map [lambda {x} { "nasty mul" 2 $x }] {1 2 3 4 5}
    }
    namespace eval test {
      map [curry [lambda {x y} { "nasty mul" $x $y }] 2] {1 2 3 4 5}
    }
    
  # Ensemble
    namespace eval test {
      proc mul {x y} { expr { $x * $y } }
      namespace export mul
      namespace ensemble create
    }
    map [list test mul 2] {1 2 3 4 5}
    map [curry [list test mul] 2] {1 2 3 4 5}
    set cmdprefix [list test mul]
    map [curry $cmdprefix 1] {1 2 3 4 5}
    
  # Object
    set obj [oo::object new]
    oo::objdefine $obj method mul {x y} { expr { $x * $y } }
    map [list $obj mul 3] {1 2 3 4 5}
    map [curry [list $obj mul] 3] {1 2 3 4 5}

  # Script
    map "list x" {1 2 3 4 5}
    set y 10
    map {list [incr y]} {1 2 3 4 5}
    
    proc test_script {} {
      set z 20
      map {list [incr z]} {2 3 4 5 6}
    }
    test_script

  # Execute a cmdprefix (in the current context)
    set plusone [lambda {x} { expr { $x + 1 } }]
    {*}$plusone 10
    set plusone [list ::tcl::mathop::+ 1]
    {*}$plusone 10
    set plus [list ::tcl::mathop::+]
    set plusone [curry $plus 1]
    {*}$plusone 10
    
  
  # Execute a cmdprefix in the caller's context - refer to [map] impl.
  
  # FAIL
    map {break;} {1 2 3 4 5}

We're not completely free from quoting hell, but we can reliably construct a cmdprefix using some simple rules:

  1. To promote a command (proc name, ensemble, object) to a cmdprefix, list it.
  2. To promote an anonymous function (i.e. {{arglist} {body}}) to a cmdprefix, use lambda.
  3. Once a cmdprefix, use curry or [linsert $cmdprefix end arg1 ...].

And we can reliably invoke a cmdprefix (with arguments) using these rules:

  1. Invoke a cmdprefix in the current scope as {*}$function ?arg1 arg2 ...?.
  2. Invoke a cmdprefix in the caller's scope as uplevel 1 $function ?[list arg1 arg2 ...]?.

Considering our design goals:

Considering past lessions:

Comparison to Tcllib struct::list

  # Chain map over filter, a poor-man's list comprehension (conceptual code)
    # This proposal
    map $func [filter $pred $inputset]
    
    # struct::list - the verb "map $func" has been split, making the operation
    # less obvious
    map [filter $inputset $pred] $func
    
  # Sum using curry (conceptual code)
    # This proposal (curry extended to allow multiple args)
    set sum [curry [list foldl] [list ::tcl::mathop::+] 0]
    
    # struct::list
    proc sum {lst} {
      fold $lst 0 [list ::tcl::mathop::+]
    }

Comparison to Wub functional-1.0.tm

  # Call [[map]] with a named procedure
    proc "nasty mul" {m v} {
      expr { $m * $v }
    }
    
    # This proposal
    map [curry [list "nasty mul"] 2] {1 2 3 4 5}
    
    # functional.tm
    map [curry [list [list "nasty mul"]] 2] {1 2 3 4 5}
    map [list [list "nasty mul"] 2] {1 2 3 4 5} ;# curry using [list]
    
  # Invoke a cmdprefix in the current scope
    
    # This proposal
    set cmdprefix [curry [list "nasty mul"] 2]
    {*}$cmdprefix 5
    
    # functional.tm
    set cmdprefix [curry [list [list "nasty mul"]] 2]
    eval {*}$cmdprefix [list 5]

Comparison to funtcl

Summary

Conclusion

We seem to be on the right path. The proposal is similar to existing extensions in both API and implementation, and conforms to the expectations of TIP 194.

The differences to existing implementations are attributable to perceived syntax improvements and/or a more consistent (or at least equally consistent but slightly simpler) model of 'a function'. I believe the benefits justify a decision not to propose an existing extension for inclusion in the core as-is.


A model for 'functions'

Working from the strawman the following definition is adopted:

A 'function' in Tcl (taken as input to or returned as output from a higher-order operator) is defined here as a command prefix:

Then follow these rules when handling 'functions':

  1. To promote a command (proc name, ensemble, object) to a cmdprefix, list it.
  2. To promote an anonymous function (i.e. {{arglist} {body}}) to a cmdprefix, use lambda
  3. Any function may be curried by lappending one or more args to the cmdprefix. This can be accomplished using curry or [lindex $cmdprefix end arg1 ...].
  4. Invoke a cmdprefix (with extra arguments) in the current scope as {*}$function ?arg1 arg2 ...?.
  5. Invoke a cmdprefix in the caller's scope as uplevel 1 $function ?[list arg1 arg2 ...]?.

The design of control::functional (stoneman)

The strawman that has been developed is rather limited. We now present the stoneman version. Further improvements leading to an ironman version may arise from discussions around the TIP proposal.

Context of execution of cmdprefix

These higher-order functions are nominally similar to imperative looping constructs, and programmers familar with foreach will expect the cmdprefix to be handled much like the body of a foreach loop.

In keeping with the Principle of Least Surprise, the cmdprefix is executed in the context of the caller of the higher-order function.

Handling side-effects

In its ideal form Functional Programming is meant to be free of side-effects. Tcl is not limited to the functional paradigm however - side-effects will occur, and the semantics of higher-order operations must be clearly defined to handle such side-effects.

Error handling

If a Tcl error (or any other non-zero return code) is encountered, the higher-order operation will abort and the error will be propagated to the caller.

Continue/break handling

continue and break are special cases of "error handling". Imperative programmers will expect continue/break to have defined interactions with loops.

These higher-order functions are not loops - they are operations that may be implemented iteratively or recursively and can do loop-like things, but they are not loops. Consequently there should be no special behaviour defined for break or continue.

Moreover break/continue in Tcl cannot propagate through a call frame, making the issue moot in most (if not all) practical applications.

At the same time we recognise that these higher-order functions may be implemented iteratively in terms of existing Tcl loops, and that those loops have defined interactions with break/continue. Specifying semantics for the higher-order functions may complicate implementation and - in the worst case - impact on the performance of a bytecoded implementation.

Therefore we take the approach that interaction with continue and break is undefined, and strictly a consequence of the implementation. Implementations may support these return codes, or promote them to TCL_ERROR (1) as if called outside a supporting construct; but it is not acceptable to ignore them!

Altering input lists during execution

As with foreach, altering the input lists during execution has no effect on the operation. The higher-order function effectively iterates over a read-only copy of the input list(s).

This should not be unexpected: the input lists are passed to the function as values, not references (varnames), so Tcl's parsing rules dictate that the value is fixed before the command is evaluated.

Order of processing (of input lists)

Some higher-order functions are inherently ordered (e.g. fold), others are not (e.g. filter). Bearing in mind the potential application of higher-order functions to scalable systems, and the opportunity to implement non-ordered functions using parallel computation, it is not desirable to specify an order of processing where such an order is not strictly required.

Therefore, for functions that do not inherently require a particular order of processing, the input list(s) may be processed may be processed in any order (perhaps even in parallel). The actual order of processing is a side-effect of implementation.

Side-effects of cmdprefix should neither expect nor rely on a particular order of iteration.

This point is largely philosophical. I do not forsee a parallel implementation of filter or map in Tcl, and I fully expect that programmers will discover (and come to expect) the ordering characteristics of the particular implementation they are using. What I would like to do though is encourage good practices and thinking around the use of higher-order constructs. We are facing a future in which parallel processing is increasingly important, and we have to be careful about continuing to think about algorithms only in terms of iteration or recursion.

Stoneman: lambda and curry

The notion of a command prefix is widely used in the Tcl core and Tk to handle callbacks. A common mistake in creating callbacks is to cause the command prefix to execute in the wrong namespace. Following the Principle of Least Surprise, a lambda should capture the context in which it is created and execute in that context by default.

Since Tcl does not have closures, the concept of "capture the context" is limited to the namespace in which the caller is executing at the time the lambda is created.

Wub's functional.tm provides this feature: the creator's namespace is discovered and the lambda will execute in that namespace.

The Wub module also provides another enhancement: additional arguments may be supplied when the lambda is created, and they immediately curry the lambda. This is potentially useful, so we will adopt it.

A number of examples on the Wiki deal specifically with expressions. As we have already seen wrapping an expression in a lambda can get rather ugly, so we definitely want to keep lambdaexpr around.

Stoneman: fold

Looking at the Wikipedia's entry for fold we find the following:

We have already established that we prefer the function (cmdprefix) on the left. We would like to provide the broadest possible support - left and right fold, with and without initial value - and main a Tclish feel (it seems reasonable to favour an optional initial value rather than having separate functions). The names foldl and foldr are widely recognised and not ambigious.

Our prototypes are:

foldl cmdprefix ?initial? list
foldr cmdprefix ?initial? list

Stoneman: filter

Looking at the Wikipedia's entry for filter we find that the function goes by many names including 'filter', 'find_all', 'grep', 'where' and 'select'. The syntax filter(pred, list) is almost universal.

We prefer the cmdprefix on the left so we can follow established syntax of other languages. I prefer the name recognised in theory: 'filter'; the other names are suggestive of extended functionality that we are not providing.

Our prototype is:

filter predicate list

Filter is an unordered operation. The input list may be processed in any order, and the order of elements in the output list does not necessarily bear any relationship to the order of the input list.

Stoneman: map

Looking at Wikipedia's entry for map we see that a number of langauges support a variadic map (with 2 or more list arguments). Most adopt the syntax map(func, list1, list2, ...).

Where the lists differ in length various strategies are employed:

There is no clear winner, and some languages offer a choice of options. Tcl's foreach offers a precedent for extending short lists with empty string elements, although this is one of the less widely supported approaches.

The route I am choosing is to support the three most common strategies. There are clearly arguments in favour of each of these, and I am not in a position to decide the one best answer (there's a good chance that no-one is).

Our prototype is:

map ?rule? cmdprefix list1 ?list2 ... listN?

Rule is one of "-shortest", "-longest" or "-equal" (prefixes are accepted).

In light of the TIP proposal I am anticipating the "bikeshed" response to this route, and I assert that the colour of this bikeshed is "Twylite". Different languages are clearly taking different approaches to this problem, and it is reasonable to assume that there is a justification for each approach. The extension approach used by foreach is popular in the Tcl community, but in the context of map it is clearly not popular in languages that have a strong Functional Programming heritage. I don't believe that anyone can provide a conclusive argument that their favoured approach is the best in all (or even most) circumstances. Moreover it is worth pointing our that an implementation that calculates the applicale length up front and then loops as foreach does (extending short lists) trivially supports all three cases. Unless someone can make a convincing argument that the C or bytecode implementation will suffer significantly from this approach, I see no reason to argue it.

Map does not have special continue or break handling. Some Tcl implementations of map-like operations do, and some users have found this useful. Nonetheless, special handling of continue or break would impose an order of processing and encourage side-effects in the cmdprefix. It is also surprisingly hard to get a cmdprefix to do something useful and be able to throw a TCL_BREAK or TCL_CONTINUE (as call frames stop these from propagating). In short, if you want to do fancy loop handling with continue or break, use foreach.

Stoneman: range/iota

range is a useful helper to generate input sets (in the mathematical sense) that can be further processed using higher-order functions.

Our prototype is:

range ?from? to ?step?

Stoneman: compose

compose implements function composition. Given two functions z=f(y) and y=g(x), the composition h = f . g (read as "f after g") produces a new function h such that z = h(x) = f(g(x)).

The implementation of compose raises three interesting questions:

  1. Can g() take more than one argument?
  2. Can f() take more than one argument, and if so how can g() return multiple values?
  3. Should compose be arbitrarily limited to two functions; why not 3 or more?

It would appear that (1) and (3) do no harm, and are easily implemented, so we'll go ahead and support them.

TBD: Complete the reasoning on (2). It looks like we can't support this without changing our understanding of what a function returns (until now we have quietly assumed that a function returns a single Tcl value (possibly a list)).

Our prototypes is:

compose func1 func2 ...

The functions are applied in reverse order, so func1 runs after (on the result of) func2 which runs after func3, and so on.