Euler's URL with Perl 6

by Arne Sommer

Euler's URL with Perl 6

Published 15. August 2019

This is my response to the Perl Weekly Challenge #21.

Challenge #21.1

«Write a script to calculate the value of e, also known as Euler’s number and Napier’s constant. Please checkout wiki page for more information.»

«e» is predefined in Perl 6, so we can just do this:

my $e = 𝑒;

If you don't like fancy Unicode characters, a normal ascii «e» works as well:

my $e = e;

The number of digits after the decimal point isn't very impressive:

say e;  # -> 2.718281828459045

The reason is that «e» is a «Num» and not a «Rat»:

say e.WHAT;  # -> (Num)

So we'll do it the hard way.

The wikipedia page has this definition:

The pattern is nice; the first value is 1, and after that we get the next value by dividing the previous value by 1 (the last number in the multiplication, which actually isn't there the first time), then the next one by 2, and so on. The definition is an infinite Sequence, so I'll set it up as such - with gather/take:

File: finding-e
my $e-seq := gather         # [1]
{
  take 1;                   # [2]

  my $current = 1;          # [3]

  for 1 .. Inf              # [4]
  {
    $current /= $_;         # [5]
    take $current;          # [6]
  }
}

sub MAIN (:$steps = 10)     # [7]
{
  say $e-seq[^$steps].sum;  # [8]
}

[1] The sequence.

[2] The first value is 1.

[3] The current value, used as the source for the next one.

[4] Loop through the dividers.

[5] Divide the previous value by the current divider to get the new value, and assign it back to «$current».

[6] Add the value to the sequence.

[7] Specify how many steps you want (including the first value; 1) with the optional «--steps» command line option. The default value is 10.

[8] Add up the required number of values (steps).

See my Perl 6 gather, I take article for more information about gather/take.

Running ut:

$ perl6 finding-e --steps=1    # -> 1
$ perl6 finding-e --steps=2    # -> 2
$ perl6 finding-e --steps=3    # -> 2.5
$ perl6 finding-e --steps=4    # -> 2.666667
$ perl6 finding-e --steps=5    # -> 2.708333
$ perl6 finding-e --steps=6    # -> 2.716667
$ perl6 finding-e --steps=7    # -> 2.718056
$ perl6 finding-e --steps=8    # -> 2.718254
$ perl6 finding-e --steps=9    # -> 2.718279
$ perl6 finding-e --steps=10   # -> 2.718282
$ perl6 finding-e --steps=100  # -> 2.718281828459045
$ perl6 finding-e --steps=1000 # -> 2.718281828459045

The output looks nicer and nicer, until the last one - which has the same value as the one above. That certainly isn't right. Some debug output is the thing:

File: finding-e-verbose (changes only)
sub MAIN (:$steps = 10, :$verbose)                                # [1]
{
  $verbose && say "{ $_ + 1 }: { $e-seq[$_].perl }" for ^$steps;  # [2]

  say $e-seq[^$steps].sum;
}

[1] An optional «--verbose» flag, which is disabled by default.

[2] Print the values in the Sequence, one at a time, to the specified upper limit.

Running it:

$ perl6 finding-e-verbose --steps=25 --verbose
1: 1
2: 1
3: 0.5
4: <1/6>
5: <1/24>
6: <1/120>
7: <1/720>
8: <1/5040>
9: <1/40320>
10: <1/362880>
11: <1/3628800>
12: <1/39916800>
13: <1/479001600>
14: <1/6227020800>
15: <1/87178291200>
16: <1/1307674368000>
17: <1/20922789888000>
18: <1/355687428096000>
19: <1/6402373705728000>
20: <1/121645100408832000>
21: <1/2432902008176640000>
22: 1.9572941063391263e-20
23: 8.896791392450574e-22
24: 3.868170170630684e-23
25: 1.6117375710961184e-24
2.718281828459045

Everything is fine until we reach the 22nd value. The previous values are of the «Rat» type, but now the program switches to «Num».

Blame the Rat

The documentation for «Rat» explains it like this: «To prevent the numerator and denominator from becoming pathologically large, the denominator is limited to 64 bit storage. On overflow of the denominator a Num (floating-point number) is returned instead

That doesn't sound good.

But read on, and you'll find this section: «If you want arbitrary precision arithmetic with rational numbers, use the FatRat type instead.»

See https://doc.perl6.org/type/Rat for more information about the «Rat» type.

See https://doc.perl6.org/type/FatRat for more information about the «FatRat» type.

Declaring the «$current» variable as a «FatRat» should do it:

File: finding-e-fatrat (changes only)
  my FatRat $current = 1.FatRat;

Note the assignment. A «FatRat» variable doesn't like Integers, so we have to coerce «1» to a «FatRat».

Running it:

$ perl6 finding-e-fatrat --steps=25 --verbose
1: 1
2: 1
3: FatRat.new(1, 2)
4: FatRat.new(1, 6)
5: FatRat.new(1, 24)
6: FatRat.new(1, 120)
7: FatRat.new(1, 720)
8: FatRat.new(1, 5040)
9: FatRat.new(1, 40320)
10: FatRat.new(1, 362880)
11: FatRat.new(1, 3628800)
12: FatRat.new(1, 39916800)
13: FatRat.new(1, 479001600)
14: FatRat.new(1, 6227020800)
15: FatRat.new(1, 87178291200)
16: FatRat.new(1, 1307674368000)
17: FatRat.new(1, 20922789888000)
18: FatRat.new(1, 355687428096000)
19: FatRat.new(1, 6402373705728000)
20: FatRat.new(1, 121645100408832000)
21: FatRat.new(1, 2432902008176640000)
22: FatRat.new(1, 51090942171709440000)
23: FatRat.new(1, 1124000727777607680000)
24: FatRat.new(1, 25852016738884976640000)
25: FatRat.new(1, 620448401733239439360000)
2.7182818284590452353602874

That looks promising. Let's try with 1000 steps:

$ perl6 finding-e-fatrat --steps=1000
2.718281828459045235360287471352662497757247093699959574966967627724076630353547
59457138217852516642742746639193200305992181741359662904357290033429526059563073
81323286279434907632338298807531952510190115738341879307021540891499348841675092
44761460668082264800168477411853742345442437107539077744992069551702761838606261
33138458300075204493382656029760673711320070932870912744374704723069697720931014
16928368190255151086574637721112523897844250569536967707854499699679468644549059
87931636889230098793127736178215424999229576351482208269895193668033182528869398
49646510582093923982948879332036250944311730123819706841614039701983767932068328
23764648042953118023287825098194558153017567173613320698112509961818815930416903
51598888519345807273866738589422879228499892086805825749279610484198444363463244
96848756023362482704197862320900216099023530436994184914631409343173814364054625
31520961836908887070167683964243781405927145635490613031072085103837505101157477
04171898610687396965521267154688957035035402123407849819334321068170121005627880
23519303322474501585390473041995777709350366041699732972508868769664035557071622
68447162560798826517871341951246652010305921236677194325278675398558944896970964
09754591856956380236370162112047742722836489613422516445078182442352948636372141
74023889344124796357437026375529444833799801612549227850925778256209262264832627
79333865664816277251640191059004916449982893150566047258027786318641551956532442
58698294695930801915298721172556347546396447910145904090586298496791287406870504
89585867174798546677575732056812884592054133405392200011378630094556068816674001
69842055804033637953764520304024322566135278369511778838638744396625322498506549
95886234281899707733276171783928034946501434558897071942586398772754710962953741
52111513683506275260232648472870392076431005958411661205452970302364725492966693
81151373227536450988890313602057248176585118063036442812314965507047510254465011
72721155519486685080036853228183152196003735625279449515828418829478761085263981
39559900673764829224437528718462457803619298197139914756448826260390338144182326
25150974827987779964373089970388867782271383605772978824125611907176639465070633
04527954661855096666185664709711344474016070462621568071748187784437143698821855
96709591025968620023537185887485696522000503117343920732113908032936344797273559
55277349071783793421637012050054513263835440001863239914907054797780566978533580
48966906295119432473099587655236812859041383241160722602998330535370876138939639
17795745401613722361878936526053815584158718692553860616477983402543512843961294
60353

That is 2465 characters, and certainly looks promising. The time usage is ok; it took about 1.3 seconds on my box.

But is it correct? We should perhaps compare the value with an official version of «e», as I did with Van Eck’s sequence in Challenge 14.1.

The next version of the program shows the version of «e» that the program computes, with any wrong digits highlighted in red. The correct number, with two extra digits, is shown on the next line. Enable this with the «--test» command line option:

File: finding-e-fatrat-test
my $e-seq := gather
{
  take 1;

  my FatRat $current = 1.FatRat;

  for 1 .. Inf
  {
    $current /= $_;
    take $current;
  }
}

sub MAIN (:$steps = 10, :$verbose, :$test)
{
  $verbose && say "{$_ + 1}: { $e-seq[$_].perl }" for ^$steps;

  my $value = $e-seq[^$steps].sum;

  if $test
  {
    my $long = get-euler-from-web;                              # [3]

    print "Answer:  ";
    for ^$value.chars -> $pos                                   # [1]
    {
      $value.substr($pos, 1) eq $long.substr($pos, 1)           # [1]
      ?? print $value.substr($pos, 1)
      !! print "\x1b[41m" ~ $value.substr($pos, 1) ~ "\x1b[0m"; # [2]
    }
    print "\n";
    say "Correct: " ~ $long.substr(0, $value.chars + 2) ~ "...";

  }
  else
  {
    say $e-seq[^$steps].sum;
  }
}

sub get-euler-from-web                                         # [3]
{
  use LWP::Simple;                                             # [4]

  my $e-string = "";

  for LWP::Simple.get('http://www-history.mcs.st-and.ac.uk/HistTopics/e_10000.html').lines -> $line # [4]
  {
    $e-string ~= $line.trim unless $line ~~ /<[a .. z A .. Z]>/; # [5]
  }

  return $e-string;
}

[1] Compare one character at a time,

[2] If they differ, show the "wrong" character with red background. The Ansi sequence \x1b[41m turns on red background, and \x1b[0m resets the colours to the default values. If your terminal doesn't support Ansi escape codes for colour you are out of luck.

[3] The program fetches Euler's Number (with 10_000 digits) from Internet every time you request testing, so will not work offline. (It is easy to rewrite it to use a local file copy, and only download the file if the local copy is missing. I'll do just that in the next section.

[4] I use «LWP::Simple.get» to get a document from the web. In this case http://www-history.mcs.st-and.ac.uk/HistTopics/e_10000.html.

[5] Skip lines not part of the number, i.e. with letter(s) in them. Remove leading and trailing spaces, and glue the result together.

Running it:

$ perl6 finding-e-fatrat-test --test --steps=10
Answer:  2.718282
Correct: 2.71828182...

$ perl6 finding-e-fatrat-test --test --steps=20
Answer:  2.7182818284590452
Correct: 2.718281828459045235...

$ perl6 finding-e-fatrat-test --test --steps=22
Answer:  2.7182818284590452353594
Correct: 2.718281828459045235360287...

$ perl6 finding-e-fatrat-test --test --steps=100
Answer:  2.71828182845904523536028747135266249775724709369995957496696762772407
6630353547594571382178525166427427466391932003059921817413596629043572900334295
2605956307
Correct: 2.71828182845904523536028747135266249775724709369995957496696762772407
6630353547594571382178525166427427466391932003059921817413596629043572900334295
260595630738...

Sometimes the value is correct (with the computed digits), as shown above without red highlighting, and sometimes 1 or more digits at the end are wrong. But the more digits we add to «e», by increasing «steps», the smaller this rounding error becomes.

Caching e

File: finding-e-fatrat-test-cached (partial)
    my $long = get-euler-from-web($test); # [1]

    ...

sub get-euler-from-web ($test)
{
  use LWP::Simple;

  my $e-string = "";

  if $test eq "cached"   # [2]
  {
    say "Loaded cached e.";
    return $*TMPDIR.add('euler_10000.txt').slurp # [3]
      if $*TMPDIR.add('euler_10000.txt').e;      # [3]
  }

  for LWP::Simple.get('http://www-history.mcs.st-and.ac.uk/HistTopics/e_10000.html').lines -> $line
  {
    $e-string ~= $line.trim unless $line ~~ /<[a .. z A .. Z]>/;
  }
  
  if $test eq "cached"                                # [4]
  {
    $*TMPDIR.add('euler_10000.txt').spurt: $e-string; # [5]
    say "Saved cached e.";
  }

  return $e-string;
}

[1] We pass the «test» value on,

[2] • and if it is "cached" we check for a saved (cached) version of «e»,

[3] • and use that if it exists.

[4] If it wasn't cached (and returned in #3), we read it the block above this one,

[5] • and save it if test has the value "cached". Here we use «spurt» to write a whole file at once.

$*TMPDIR gives us the «system temporary directory» as an IO object. Using the «add» method to add the filename avoids using the directory separator, and the result is a program that works just as well on Linux as on Windows. «slurp» is used to get the content of the whole file, is one go, without the normal open/read/close cycle. «spurt» is the opposite, as it writes the whole file in one go.

See https://docs.perl6.org/routine/spurt for more information about «spurt».

See https://docs.perl6.org/routine/slurp for more information about «slurp».

If you simply want a working version of the program, without all the testing stuff, her it is:

File: finding-e-fixed
my $e-seq := gather
{
  take 1;

  my FatRat $current = 1.FatRat;

  for 1 .. Inf
  {
    $current /= $_;
    take $current;
  }
}

sub MAIN (:$steps = 10)
{
  say $e-seq[^$steps].sum;
}

Challenge #21.2

Write a script for URL normalization based on rfc3986. This task was shared by Anonymous Contributor.

According to Wikipedia, URL normalization is the process by which URLs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URL into a normalized URL so it is possible to determine if two syntactically different URLs may be equivalent.»

Let us revisit the URL parsing Grammar from my solution to Challenge 17.2. Here we have the Grammar, with some changes highlighted:

File: url-normalization (partial)
grammar URL
{
  regex TOP       {  ? ? ? ? }
  regex SchemeW   {   }
  regex SchemeS   { ':' }
  regex Scheme    { <[a..z A..Z]><[a..z A..Z 0..9 + . : \-]>* } # [1]
  regex Hostinfo  { '//' ?  ? }
  regex UserinfoW {   }
  regex Userinfo  { .*[\:.+]? }
  regex UserinfoS { '@' }
  regex Host      { <[\w \. \-]>* }
  regex PortW     {   }
  regex PortS     { ':' }
  regex Port      { \d+ }
  regex Path      { '/'? <[\w \d / %  /\. ] - [#?]>+ } # [2]
  regex QueryW    {   }
  regex QueryS    { '?'  }
  regex Query     { <[\w \d \- =]>* }
  regex FragmentW {   }
  regex FragmentS { '#' }
  regex Fragment  { .+ }
}

[1] This change makes it possible to specify the Scheme (e.g. http) with uppercase letters.

[2] This was actually an error (in my solution to Challenge 17.2), that prevented Paths with slashes, «%» or «.» in them from beeing parsed.

The wiki page presents 4 «Normalizations that preserve semantics», and I have chosen to only do those.

  • R1: Converting the scheme and host to lower case
  • R2: Capitalizing letters in escape sequences
  • R3: Decoding percent-encoded octets of unreserved characters
  • R4: Removing the default port
(The others are «Normalizations that usually preserve semantics» and «Normalizations that change semantics».)

This is the framework for the rest of the program, after the Grammar:

File: url-normalization (partial)
sub MAIN ($url, :$verbose)                                                 # [1]
{
  my %ports = ( ftp => 20, http => 80, https => 443 );                     # [2]
  my Set $unreserved = ("41" ... "49", "4A" ... "4F", "50" ... "59", "5A", # [3]
                        "30" ... "39", "2D", "2E", "5F", "7E").flat.Set;

  my %translate;
  %translate{'%' ~ $_} = chr( $_.parse-base(16) ) for $unreserved.keys;    # [4]

  my $result = URL.parse($url);               # [5]

  if $result 
  {
    ...                                       # [6]
  }
  else
  {
    say "Invalid URL.";                       # [7]
  }
}

[1] I have added a verbose mode here as well, to make it easier to see what is going on.

[2] Rule «R4» tells us to remove the default port for the given protocol, so I need a mapping between the shcemes (which I would call protocols) (or at least the most used ones) and ports. See https://en.wikipedia.org/wiki/Port_(computer_networking) for a list of protocols and default ports.

[3] Rule «R3» tells us to translate some %-prefixed strings to the actual value. The strings to translate are given here (without the initial «%»),

[4] and we set up the mapping to the real character here. The %-string is a hexadecimal value, which is the ascii code of the character. So we turn it into a (decimal) number with «parse-base(16)» and then to the actual character with «chr».

[5] Parse the URL with the Grammar. We store the result (a Match object) in a variable, as I need to do a Regex match later on - and that will overwrite the global Match object «$/».

[6] If it parses, do something...

[7] or print an error message.

Now the code that replaces the «...» part, block for block:

    say $result, "\n" if $verbose;             # [8]
    
    my $scheme = $result<SchemeW><Scheme>.lc;  # [9]
    my $new    = $scheme ~ ":";                # [10]
    say "scheme:   $result<SchemeW><Scheme> -> $scheme" if $verbose; # [11]

[8] Show the Match object.

[9] Get the Scheme, in lower case.

[10] This variable is used to build up the normalized version of the URL.

[11] Show the old and new version of the Scheme.

    my $userinfo = ""; if $result<Hostinfo><UserinfoW><Userinfo>
    {
      $userinfo = $result<Hostinfo><UserinfoW><Userinfo>;
      $new ~= $userinfo ~ '@';  # [12]
      say "userinfo: $userinfo" if $verbose;
    }

[12] Remember to add the «@» back on to the Userinfo, if present, as the Grammar removed it.

    my $host = ""; if $result<Hostinfo><Host>
    {
      $host = $result<Hostinfo><Host>.lc;
      $new ~= "//$host" if $host;  # [13]
      say "host:     $result<Hostinfo><Host> -> $host" if $verbose;
    }

[13] As above, but for the Host, and the missing part is «//».

    my $port = ""; if $result<Hostinfo><PortW><Port>
    {
      $port = $result<Hostinfo><PortW><Port>
        unless %ports{$scheme} == $result<Hostinfo><PortW><Port>; # [14]
      $new ~= ":$port" if $port;
      say "port:     $result<Hostinfo><PortW><Port> -> $port" if $verbose;
    }

[14] Add the port number, but only if it is a non-default port (for the given scheme).

    my $path = ""; if $result<Path>
    {
      $path = $result<Path>;
      my $new-path = $path;
      
      if $path ~~ /\%/                        # [15]
      {   
        $new-path .= subst(/\%../, *.uc, :g); # [16]

        for %translate.keys -> $key        # [17]
	{
	  if $new-path ~~ /$key/              # [17a]
	  {
            say "          (path translate $key -> %translate{$key})"
              if $verbose;
            $new-path .= subst($key, %translate{$key}, :g); # [17b]
          }
        }
      }
      $new ~= $new-path;
      say "path:     $path -> $new-path" if $verbose;
    }

[15] If the path contains one or more «%»,

[16] • convert the escape sequences (Rule R2) to Uppercase.

[17] • for all the escape sequences that should have been a regular character (Rule R3), check if it is present (# 17a), and if it is, replace it (once or more; hence the «:g» as in global) (# 18b).

    my $query; if $result<QueryW><Query>
    {
      $query = $result<QueryW><Query>;
      $new ~= "?$query"; # [18]
      say "query:    $query" if $verbose;
    }

[18] Remember the «?» prefix on the Query string, if given.

    my $fragment = ""; if $result<FragmentW><Fragment>
    {
      $fragment = $result<FragmentW><Fragment>;
      $new ~= "#$fragment";
      say "fragment: $fragment" if $verbose;
    }

[19] And the «#» prefix for the Fragment, if given.

    print "\n" if $verbose;
    
    say "Original: $url";  # [20]
    say "New:      $new";  # [21]

[20] The original URL.

[21] The normalised one.

Running it:

$ perl6 url-normalization http://perl6.eu/eulers-url.html
Original: http://perl6.eu/eulers-url.html
New:      http://perl6.eu/eulers-url.html

$ perl6 url-normalization HTTPS://perl6.EU/eulers-url.html
Original: HTTPS://perl6.EU/eulers-url.html
New:      https://perl6.eu/eulers-url.html

$ perl6 url-normalization Http://en.wikipedia.org:80/aaa/as
Original: Http://en.wikipedia.org:80/aaa/as
New:      http://en.wikipedia.org/aaa/as

$ perl6 url-normalization Http://en.wikipedia.org:443/aaa/as
Original: Http://en.wikipedia.org:443/aaa/as
New:      http://en.wikipedia.org:443/aaa/as

$ perl6 url-normalization Https://en.wikipedia.org:443/aaa/as
Original: Https://en.wikipedia.org:443/aaa/as
New:      https://en.wikipedia.org/aaa/as

$ perl6 url-normalization http://en.wikipedia.org:80/aaa%339%7e/as?7eQ#1
Original: http://en.wikipedia.org:80/aaa%339%7e/as?7eQ#1
New:      http://en.wikipedia.org/aaa39~/as?7eQ#1

And finally, let's do one with the «--verbose» command line option:

$ perl6 url-normalization --verbose Http://en.wikipedia.org:80/aaa%12/as?A1a#1
「Http://en.wikipedia.org:80/aaa%12/as?A1a#1」
 SchemeW => 「Http:」
  Scheme => 「Http」
  SchemeS => 「:」
 Hostinfo => 「//en.wikipedia.org:80」
  Host => 「en.wikipedia.org」
  PortW => 「:80」
   PortS => 「:」
   Port => 「80」
 Path => 「/aaa%12/as」
 QueryW => 「?A1a」
  QueryS => 「?」
  Query => 「A1a」
 FragmentW => 「#1」
  FragmentS => 「#」
  Fragment => 「1」

scheme:   Http -> http
host:     en.wikipedia.org -> en.wikipedia.org
port:     80 -> 
path:     /aaa%12/as -> /aaa%12/as
query:    A1a
fragment: 1

Original: Http://en.wikipedia.org:80/aaa%12/as?A1a#1
New:      http://en.wikipedia.org/aaa%12/as?A1a#1

The complete program is available in the zip file.

And that's it.