Ackerman, URL and Perl 6

by Arne Sommer

Ackerman, URL and Perl 6

Published 21. June 2019

This is my response to the Perl Weekly Challenge #17.

Challenge #17.1

Create a script to demonstrate Ackermann function. The Ackermann function is defined as below, m and n are positive number:
          A(m, n) = n + 1                  if m = 0
          A(m, n) = A(m - 1, 1)            if m > 0 and n = 0
          A(m, n) = A(m - 1, A(m, n - 1))  if m > 0 and n > 0

Example expansions as shown in wiki page.

         A(1, 2) = A(0, A(1, 1))
                 = A(0, A(0, A(1, 0)))
                 = A(0, A(0, A(0, 1)))
                 = A(0, A(0, 2))
                 = A(0, 3)
                 = 4

This is almost the same problem as the Hofstadter sequences from Challenge 13.2, where I used multiple dispatch («multi sub»). So this is straighforward.

I have chosen to declare a custom type for positive integers (0 included), and use a «MAIN» wrapper procedure to get the arguments from the command line:

File: ackermann
subset PositiveIntZero of Int where * >= 0;

sub MAIN(PositiveIntZero \m, PositiveIntZero \n)
{
  say A(m, n);
}

multi A(PositiveIntZero \m where m == 0, PositiveIntZero \n)
{
  return n + 1;
}

multi A(PositiveIntZero \m where m > 0, PositiveIntZero \n where n == 0)
{
  return A(m - 1, 1);
}

multi A(PositiveIntZero \m where m > 0, PositiveIntZero \n where n > 0)
{
  return  A(m - 1, A(m, n - 1));
}

Running it:

$ perl6 ackermann 1 2
4

We can get rid of all the «where» clauses in the constraints in the «multi»s:

File: ackermann2
subset PositiveInt of Int where * > 0;
subset PositiveIntZero of Int where * >= 0;

sub MAIN(PositiveIntZero \m, PositiveIntZero \n)
{
  say A(m, n);
}

multi A(0, PositiveIntZero \n)
{
  return n + 1;
}

multi A(PositiveInt \m, 0)
{
  return A(m - 1, 1);
}

multi A(PositiveInt \m, PositiveInt \n)
{
  return  A(m - 1, A(m, n - 1));
}

Note that «PositiveIntZero \n where n == 0» can be replaced with the short form «0», if it is ok to loose the named variable.

Recursive procedures are not very efficient when they do a lot of computations. Here is a non-recursive version that also caches the already computed values:

File: ackermann-cached
subset PositiveIntZero of Int where * >= 0;

sub MAIN(PositiveIntZero \m, PositiveIntZero \n)
{
  say A(m, n);
}
my %cache;

sub A(PositiveIntZero \m, PositiveIntZero \n)
{
  unless %cache{m}{n}.defined                # [1]
  {
    if m == 0                                # [3]
    {
      %cache{m}{n} = n + 1;                  # [2]
    }
    elsif n == 0                             # [3]
    {
      %cache{m}{n} = A(m - 1, 1);            # [2]
    }
    else                                     # [3]
    {
      %cache{m}{n} = A(m - 1, A(m, n - 1));  # [2]
    }
  }
  return %cache{m}{n};
}

[1] If the value we need hasn't been computed already,

[2] do so, and save the value.

[3] The «multi»s has been recplaced by a traditional «if»-block.

Let us take a look at the speedup gain:

ArgumentResultackermannackermann-cached
3 1130.2s0.2s
4 165533> 5 hours2.2s
5 1?unknown> 5 hours


The cached version is way better than the recursive version, but even that is pretty slow when we increase the «m» value.

Challenge #17.2

Create a script to parse URL and print the components of URL. According to Wiki page, the URL syntax is as below:

        scheme:[//[userinfo@]host[:port]]path[?query][#fragment]

For example: jdbc:mysql://user:password@localhost:3306/pwc?profile=true#h1

          scheme:   jdbc:mysql
          userinfo: user:password
          host:     localhost
          port:     3306
          path:     /pwc
          query:    profile=true
          fragment: h1

This nice illustration, lifted from the wikipedia article, explains it all:

I'll dive stright in, and present a complete program, with a rather complex regex. Perl 6 allows spaces (and newlines) in regexes, so I have added inline comments to show what the individual parts are doing.

File: url-parse
sub MAIN ($url, :$verbose)
{
  if $url ~~
    /^
      (<[a..z]><[a..z 0..9 + . : \-]>*)\: # $0 scheme
      [\/\/                               # //
        [(.*[\:.+]?)\@]?                  # $1 userinfo (optional)
        (<[\w \. \-]>*)                   # $2 host
        [\:(\d+)]?                        # $3 port (optional)
	(\/?)                             # $4 path separator
      ]?                                  # $1-$4 are optional
      ([<[\w \d -] - [#?]>]+)?            # $5 path (optional)
      [\?(<[\w \d \- =]>*)]?              # $6 query (optional) 
      [\#(.*)]?                           # $7 fragment (optional)
    $/
  {

    say $/ if $verbose;

    say "scheme:    $0";
    say "userinfo:  $1" if $1;
    say "host:      $2" if $2;
    say "port:      $3" if $3;

    my $path = $4 if $4; $path ~= $5 if $5;

    say "path:     $path";
    say "query:    $6" if $6;
    say "fragment: $7" if $7;
  }
  else
  {
    say "Invalid URL.";
  }
}

The only thing I'll point out here is the fact that the path is the combination of the two matches «$4» and «$5».

See perlpilot's excellent Introduction to Perl 6 Regex article for an introduction to Perl 6 Regexes.

Running it:

$ perl6 url-parse jdbc:mysql://user:password@localhost:3306/pwc?profile=true#h1
scheme:   jdbc:mysql
userinfo: user:password
host:     localhost
port:     3306
path:     /pwc
query:    profile=true
fragment: h1

Add the «verbose» switch if you want to see the match object:

$ perl6 url-parse --verbose jdbc:mysql://user:password@localhost:3306/pwc?profile=true#h1
「jdbc:mysql://user:password@localhost:3306/pwc?profile=true#h1」
 0 => 「jdbc:mysql」
 1 => 「user:password」
 2 => 「localhost」
 3 => 「3306」
 4 => 「/」
 5 => 「pwc」
 6 => 「profile=true」
 7 => 「h1」
...

IPv4 addresses (hosts) will be validated, but I have disregarded support for IPv6. It is possible to do a DNS lookup for a hostname to see that it actually exists (and reach out for the IP address to see if it is reachable).

Grammars

The regex isn't very reader friendly, even when sprinkled with comments. But we can use a Grammer instead. The «TOP» rule (regex) is the default starting point:

File: url-parse-grammar
grammar URL
{
  regex TOP       { <SchemeW> <Hostinfo>? <Path>? <QueryW>? <FragmentW>? }
  regex SchemeW   { <Scheme> <SchemeS> }
  regex SchemeS   { ':' }
  regex Scheme    { <[a..z]><[a..z 0..9 + . : \-]>* }
  regex Hostinfo  { '//' <UserinfoW>? <Host> <PortW>? }
  regex UserinfoW { <Userinfo> <UserinfoS> }
  regex Userinfo  { .*[\:.+]? }
  regex UserinfoS { '@' }
  regex Host      { <[\w \. \-]>* }
  regex PortW     { <PortS> <Port> }
  regex PortS     { ':' }
  regex Port      { \d+ }
  regex Path      { '/'? <[\w \d -] - [#?]>+ }
  regex QueryW    { <QueryS> <Query> }
  regex QueryS    { '?'  }
  regex Query     { <[\w \d \- =]>* }
  regex FragmentW { <FragmentS> <Fragment> }
  regex FragmentS { '#' }
  regex Fragment  { .+ }
}
  
sub MAIN ($url, :$verbose)
{
  my $result = URL.parse($url);

  if $result
  {
    say $result if $verbose;
    say "scheme:   $/<SchemeW><Scheme>";
    say "userinfo: $/<Hostinfo><UserinfoW><Userinfo>"
      if $/<Hostinfo><UserinfoW><Userinfo>;
    say "host:     $/<Hostinfo><Host>" if $/<Hostinfo><Host>;
    say "port:     $/<Hostinfo><PortW><Port>" if $/<Hostinfo><PortW><Port>;
    say "path:     $/<Path>" if  $/<Path>;
    say "query:    $/<QueryW><Query>" if $/<QueryW><Query>;
    say "fragment: $/<FragmentW><Fragment>" if $/<FragmentW><Fragment>;
  }
  else
  {
    say "Invalid URL.";
  }
}

I have added a trailing «W» to regexes that are needed as wrappers, and «S» to get rid of an explicit symbol (either pre- of postfix) which shouldn't be part of the match.

See Andrew Shitow's Perl 6 Grammars, Part 1 for an introduction to Perl 6 Grammars. (There is no part 2.) The Grammar tutorial in the official documentation is also recommended.

Running it gives the same result, but the match object is quite different:

$ perl6 url-parse-grammar --verbose jdbc:mysql://user:password@localhost:3306/pwc?profile=true#h1
「jdbc:mysql://user:password@localhost:3306/pwc?profile=true#h1」
 SchemeW => 「jdbc:mysql:」
  Scheme => 「jdbc:mysql」
  SchemeS => 「:」
 Hostinfo => 「//user:password@localhost:3306」
  UserinfoW => 「user:password@」
   Userinfo => 「user:password」
   UserinfoS => 「@」
  Host => 「localhost」
  PortW => 「:3306」
   PortS => 「:」
   Port => 「3306」
 Path => 「/pwc」
 QueryW => 「?profile=true」
  QueryS => 「?」
  Query => 「profile=true」
 FragmentW => 「#h1」
  FragmentS => 「#」
  Fragment => 「h1」
...

We got a tree in the match object (actually an Abstract Syntax Tree, or AST), and had to follow the path through the structure to get to the nodes (as shown in the «say» statements).

Writing $/<Hostinfo><PortW><Port> to access one value may seem too much trouble, but that's the cost of using a tree.

It is possible to add code in the regexes. And let the grammar itself print the result. E.g.:

  regex Userinfo  { .*[\:.+]? { say "userinfo: $/" if $verbose } }

But this kicks in whenever the regex matches, and it does a lot of backtracking as I have started with a greedy regex (.*) in this case:

userinfo: user:password@localhost:3306?kkkkk#sksks
userinfo: user:password@localhost:3306?kkkkk#sksk
userinfo: user:password@localhost:3306?kkkkk#sks
userinfo: user:password@localhost:3306?kkkkk#sk
userinfo: user:password@localhost:3306?kkkkk#s
userinfo: user:password@localhost:3306?kkkkk#
userinfo: user:password@localhost:3306?kkkkk
userinfo: user:password@localhost:3306?kkkk
userinfo: user:password@localhost:3306?kkk
userinfo: user:password@localhost:3306?kk
userinfo: user:password@localhost:3306?k
userinfo: user:password@localhost:3306?
userinfo: user:password@localhost:3306
userinfo: user:password@localhost:330
userinfo: user:password@localhost:33
userinfo: user:password@localhost:3
userinfo: user:password@localhost
userinfo: user:password@localhos
userinfo: user:password@localho
userinfo: user:password@localh
userinfo: user:password@local
userinfo: user:password@loca
userinfo: user:password@loc
userinfo: user:password@lo
userinfo: user:password@l
userinfo: user:password@
userinfo: user:password

This gives a lot of noise (false positives), and it doesn't answer the challenge as partial matches will be printed out - even if the URL as a whole is invalid.

It is possible to remedy both problems by storing the value in a (global) hash instead of printing it in the regexes, and just let the code rewrite the value again and again (until it gets it right, on the last try). In the «say»-block we can then access the hash instead of the match object. E.g:

my %values;
...
  regex Userinfo  { .*[\:.+]? { %values = ~$/ } }
...
  say "userinfo: %values" if %values;

The output shows that I shouldn't have used a greedy regex here in the first place, as it causes a lot of unnecessary searches (at a cost). I could have used a non-greedy one, and saved the Regex engine for some work. But I'll leave it at that, as the program works and it isn't that critical computational wise in this case.

I do hope you as a reader agree that the important lesson from this challenge is that grammars are actually easier to use than complex regexes. Do compare the two versions, and imagine the regex version as a one-liner, without comments (to make the regex even harder to understand). A grammar doesn't require comments (or at least not that many of them), if the rules have carefully chosen names.

Still not convinced? Scroll up to the «nice illustration, lifted from the wikipedia article» illustration, and compare it with the grammar.