Raku P(i)ermutations

by Arne Sommer

Raku P(i)ermutations

[7] Published 16. April 2019.

Perl 6 → Raku

This article has been moved from «perl6.eu» and updated to reflect the language rename in 2019.

This is my response to the Perl Weekly Challenge #4.

Challenge #4.1

Write a script to output the same number of PI digits as the size of your script. Say, if your script size is 10, it should print 3.141592653.

Raku has «pi» built in:

$ raku
> say pi;       # -> 3.141592653589793
> say pi.WHAT;  # -> (Num)

See docs.raku.org/language/terms#term_pi for more information about «pi».

We can either hard code the script length in the script itself, or we can let the script dynamically get its own size. I'll start with the first one, as that is easier.

Hard Coded File Size

The «fmt» method is the shortest way (in code) of specifying the length of a variable to display. «pi» is a floating point number (the «Num» type), and we can use the «f» flag to decide how many digits to display after the decimal point:

> say pi.fmt('%.2f');  # -> 3.14
> say pi.fmt('%.3f');  # -> 3.142

Always use single quotes around a format string. You really don't want the compiler to think that «%» is a sigil.

See docs.raku.org/routine/sprintf#(Minimum)_Width for more information about the format string.

See docs.raku.org/type/Num for more information about the floating point «Num» type.

We can try to shorten the code, by replacing the parens with the colon syntax (see the Raku Colonoscopy article), but that requires a space after the colon and the size is the same:

> say pi.fmt: '%.2f';  # -> 3.14

But we can ditch the semicolon, as it isn't needed after the last expression in a block. This is probably the shortest possible way of doing this:

File: say-pi:
say pi.fmt('%.18f')

The file has a size of 19 bytes, so 18 digits after the decimal point is the thing. Running it:

$ raku say-pi
3.1415926535897930000

But hey! The trailing «0000» surely is wrong?

> say pi, " ", pi.chars
3.141592653589793 17

The built in «pi» has 16 digits only, 15 of them after the decimal point. So when we asked for more digits Raku did so by adding zeros.

We need a version of «pi» with more digits. We can start with the Math section of the Raku Module List (at «modules.raku.org»), and see if somebody has solved the problem for us.

The «Math::Constants» module looks promising. It actually defines «pi», but with a much lower resolution (number of digits) than the built in version, so that is a dead end:

my constant pi is export = 3.142857;

Dynamic File Size

I'll deal with «pi» later, and have a look at the dynamic version first. We can use the special variable «$?FILE» to get the name of the current file. Applying the «.IO» method on a file name string gives us an «IO» object, and calling the «.s» method gives us the the file size in bytes:

> say "/home/raku/say-pi2".IO.s;  # -> 34

See https://docs.raku.org/language/variables#Compile-time_variables ($?FILE), docs.raku.org/routine/IO (IO) and docs.raku.org/routine/s (s) for more information.

Note that the number of characters in the file (as shown when you open the file in an editor) may not match the byte count (as reported by «.s»). Different ways of coding newlines is the oldest problem, and Unicode is the newest. Both topics are covered in my Beginning Raku course.

The program now looks like this:

File: say-pi2
say pi.fmt("%.{$?FILE.IO.s -1}f")

I had to use double quotes to enable curly parens interpolation in the string. (The inside of curlies in a double quoted string is taken as code, and the result is placed in the string.) I told you to always use single quotes earlier, but this is the exception to that rule.

Running it:

raku say-pi2
3.141592653589793000000000000000000

We should count the digits, to ensure that it is correct:

> say "3.141592653589793000000000000000000".chars;  # -> 35

The file has 34 bytes, so I got it right.

We can modify the script, by e.g. inserting extra line breaks or the missing «shebang» line to ensure that it continues to work as advertised. E.g.

File: say-pi2shebang
#! /usr/bin/env raku

say pi.fmt("%.{$?FILE.IO.s -1}f")

Running it gives a string that is 58 characters long, and the file has 57 bytes, so this is also correct.

$ raku say-pi2shebang 
3.14159265358979300000000000000000000000000000000000000000

The trailing zeros are really annoying, so I'll deal with them.

Fixing Pi

en.wikipedia.org/wiki/Pi gives us «pi» with 50 digits: «3.14159265358979323846264338327950288419716939937510».

We can try that value, in REPL, and see what happens:

> 3.1415926535897932384626433832795028841971693993751
  3.14159265358979321627946859855630841326670589198336

I have added the space on the output to make it easier to see that the values differ. So we have a serious rounding error, making the value unusable.

The solution is to quote the value, turning it into a string, and use «substr» to get the correct number of characters:

File: say-pi3
#! /usr/bin/env raku

my $pi = "3.1415926535897932384626433832795028841971693993751";

say $pi.WHAT;
say $pi;

say $pi.substr(0, $?FILE.IO.s +1);

See docs.raku.org/routine/substr for more information about «substr».

Fixing Pi, Really

This doesn't work at all, as you obviously have spotted. The number of digits to display is the same as the file size. But we must specify all those digits in the file itself, so the file will always be too big for the value we use.

The solution is to move the definition of «pi» to a module, and use that. This can be considered as cheating, but I cannot see any other way forward.

So we set up a module «PiXL» (as a short name for what it is about), and place it locally:

File: lib/PiXL.rakumod (partial)
use v6;

unit module PiXL;

our constant $PI is export = "3.14159265...

I went slightly overboard, and specified «pi» with 1000 digits (taken from www.math.com/tables/constants/pi.htm).

The program looks like this now:

File: say-pi-module
#! /usr/bin/env raku

use lib "lib";

use PiXL;

say $PI.substr(0, $?FILE.IO.s -1);

Running it:

$ raku say-pi-module
3.14159265358979323846264338327950288419716939937510582097\
49445923078164062862089986

The \ character means that I have inserted a newline (to make the text fit the web page).

It is possible to make the program shorter by removing the Shebang line. The next thing to do is making the module installable, actually install it, and get rid of the «use lib» line. But I'll stop here.

Exercise: Change the size of the file (by adding or removing newlines or comments) and run it to see that the output changes as well.

Pi as a Procedure

We can shorten the code by implementing «pi» as a procedure, taking the size as argument.

File: lib/PiXL.pm6 (partial)
our sub PI (Int $length where $length >= 4) is export
{
  return $PI.substr(0, $length);
}

The modified program (which is 9 bytes shorter) looks like this:

File: say-pi-module2
#! /usr/bin/env raku

use lib "lib";

use PiXL;

say PI($?FILE.IO.s -1);

Challenge #4.2

You are given a file containing a list of words (case insensitive 1 word per line) and a list of letters. Print each word from the file than can be made using only letters from the list. You can use each letter only once (though there can be duplicates and you can use each of them once), you don’t have to use all the letters.

We have a single file with the words as well as the letters. So we can start by reading the file, and sort out the content:

File: verbose
for lines() -> $line
{
  if $line.chars == 1
  {
    say "Letter: $line";
  }
  else
  {
    say "Word: $line";
  }
}

«lines()» gives us the content of the file(s) specified on the command line, one line at a time. If you run the program without arguments it will wait for the user to input them, one line at a time. See the «lines()» section in my Raku Gather, I Take article for more information.

Running it:

$ raku verbose words.txt
Word: abba
Word: axiom
Letter: a
Letter: b
Letter: b
Letter: a
Word: cello
Word: bab
Letter: x
Letter: f
File: words.txt
abba
axiom
a
b
b
a
cello
bab
x
f

We start with saving the letters and words for later, using a list for the letters (allowing duplicates) and a SetHash for the words (not allowing duplicates):

File: combinations (partial)
my @letters;
my %words is SetHash;

for lines() -> $word-or-character
{
  if $word-or-character.chars == 1
  {
    @letters.push($word-or-character.lc);
  }
  else
  {
    %words{$word-or-character.lc} = True;
  }
}

The Challenge said «case insensitive», so I convert everything to lower case (with «.lc») just in case. (Pun intended.)

«SetHash» is a variant of hash, where the values can only be «True» or «False» (and where we can use Set operators, but we don't need them here). See docs.raku.org/type/SetHash for more information.

Before we start making up words, we should do some thinking. We have 6 letters in our file, and combining them in all the possible ways are unneccessary. The longest word in the file (both «axiom» and «cello» have 5 letters) is the upper limit for the length of a valid word. Getting this value is impressive, elegant and utterly confusing if you haven't seen the >> syntax before:

File: combinations (partial)
my $max-length = %words.keys>>.chars.max;
#                [1]    [2] [3] [4]  [5]

[1] We start with the hash (or rather SetHash) of words,

[2] taking all the keys, giving us a list of words.

[3] We use the Hyper Method Call Operator «>>» to run the next operation on all the elements (possibly in parallel).

[4] The operation we use is «chars», which gives us the the length of each word.

[5] And finally we apply «max» on the list of lengths to reduce it to a single value, the highest value - and the longest word.

See docs.raku.org/language/operators#postfix_»._/_postfix_>>. for more information about «>>.».

The «combinations» method applied on a list of characters gives us every possible combination, from zero length to all of them. The return value is a list of lists. We can have a go in REPL:

> say <a b e e>.combinations
(() (a) (b) (e) (e) (a b) (a e) (a e) (b e) (b e) (e e) (a b e) (a b e) \
(a e e) (b e e) (a b e e))

The list has duplicates if the input list has duplicates, as «combinations» doesn't care about the actual values it has been asked to combine (and doesn't check for uniqueness). That is a good thing, as we need to support duplicate letters.

Note that we get the list «(a e)» but not «(e a)». That is because «combinations» gives us the values, and doesn't care about the order. I'll deal with that later.

See docs.raku.org/routine/combinations for more information about «combinations».

Then we can have a go at making up the words:

File: combinations (partial)
my %seen;                        # [1]

for @letters.combinations: 2 .. $max-length -> $candidate  # [2]
{
  my $word = $candidate.join;    # [3]
  next if %seen{$word};          # [4]
  %seen{$word} = True;           # [4]
  
  print "$word ";                # [5]
}

print "\n";                      # [6]

[1] We use this variable to keep track of words we have already considered, to avoid duplicates. Duplicates in the output looks amateurishly, and it takes extra time to compute them twice (or more times) as well.

[2] We start with combinations of two characters, as the single characters are the letters, and stop when we reach the upper limit (the length of the longest word) which in our case is 5. This is specified as an integer range (with «..» between the lower and upper limits).

[3] The return value from «combinations» is a list (don't let the «$» sigil confuse you), and we use «join» to turn it into a string.

[4] Skip the word if we have seen it before.

[5] Print the words, one at a time with a space between them.

[6] Finish the list with a newline.

Running it:

$ raku combinations words.txt 
ab aa ax af bb ba bx bf xf \
abb aba abx abf aax aaf axf bba bbx bbf bax baf bxf \
abba abbx abbf abax abaf abxf aaxf bbax bbaf bbxf baxf \
abbax abbaf abbxf abaxf bbaxf

This gave us all the combinations of letters to form possible words, but we need to shuffle the characters around as well.

We can use the «permutations» method to give us the shuffled versions of a list of characters:

> say <a b c>.permutations
((a b c) (a c b) (b a c) (b c a) (c a b) (c b a))

> say <a a b>.permutations
((a a b) (a b a) (a a b) (a b a) (b a a) (b a a))
Note the duplicates in the result, but only if we have duplicates in the character list.

See docs.raku.org/routine/permutations for more information about «permutations».

Replace the line marked [5] in the program with the following block (and remove the line marked [6]):

  for $word.comb.permutations.map(*.join).sort.unique -> $possible
  #   [1]   [2]  [3] ######## [4] ####### [5]  [6]
  {
    say $possible if %words{$possible};  # [7]
  }
[1] We start with the word,

[2] converting ut to a list of single characters with «comb»,

[3] as «permutations» takes a list as input.

[4] «permutations» returns a list of list, and we use «map» to do something with the outer list. The something in this case is joining each of the inner lists together to a string.

[5] Then we apply «sort» to get the list (the outer list, which now contains strings) in sorted order (primarily so that the output from different runnings of the program will give the output in the same order).

[6] Applying «unique» on the list removes duplcates. (We could have used «squish» instead, as the list is sorted. But be careful, as it relies on identical values beeing adjacent!)

[7] Output the word if it is in the list of legal words.

The complete program:

File: combinations
my @letters;
my %words is SetHash;

for lines() -> $word-or-character
{
  if $word-or-character.chars == 1
  {
    @letters.push($word-or-character.lc);
  }
  else
  {
    %words{$word-or-character.lc} = True;
  }
}

my $max-length = %words.keys>>.chars.max;

my %seen;

for @letters.combinations: 2 .. $max-length -> $candidate
{
  my $word = $candidate.sort.join;   # [1]
  next if %seen{$word};
  %seen{$word} = True;
  
  for $word.comb.permutations.map(*.join).sort.unique -> $possible
  {
    say $possible if %words{$possible};
  }
}

[1] The extra «sort» here is needed, as further testing revealed. The «Oops?» section (below) has the details.

Running it:

$ raku combinations words.txt
bab
abba

Oops?

It is possible that I have taken the challenge too litterally, and that the letters should be separate from the word file. E.g. specified as arguments to the program:

$ raku combinations2 words2.txt a b b a
bab
abba

That is easy to fix. We start with the arguments, and use the «MAIN» function to get hold of them:

File: combinations2 (partial)
unit sub MAIN ($word-file where $word-file.IO.r,      # [1]
               *@letters where @letters.elems >= 1);  # [2]

my %words is SetHash;

%words{$_.lc} = True for $word-file.IO.lines();  # [3]

[1] The first argument to the program (and thus MAIN) is a file name, and we use «where» to add a constraint on it. In this case we require a readable file (with «IO.r»).

[2] The letters come as separate arguments, but we can collect them all in an array by turning the argument into a slurpy argument with a prefix «*». A slurpy argument allows zero elements, so I have added a «where» clause to ensure that we have at least one letter.

[3] Reading the file is easier this time, as we don't have to filter out the single letters. But we must use «.IO.lines()» to read the lines, as the file name is an ordinary argument now. This line replaces the entire first «for» loop.

I have used «unit sub MAIN;» instead of just «sub» and curlies (e.g. «sub MAIN { }»

Change the «2» to «1» in this line to allow one letters words (as «I»):

for @letters.combinations: 2 .. $max-length -> $candidate

See docs.raku.org/language/create-cli#index-entry-MAIN for more information about «MAIN».

See docs.raku.org/type/Signature#index-entry-where_clause for more information about «where».

See docs.raku.org/language/create-cli#index-entry-declarator_unit_(MAIN) for more information about «unit MAIN».

See docs.raku.org/type/Signature#index-entry-slurpy_argument for more information about slurpy arguments.

This version can be used on normal dictionary files. Ubuntu Linux has the following dictionaries:

  • /usr/share/dict/american-english (the «wamerican» package)
  • /usr/share/dict/british-english (the «wenglish» package)
  • /usr/share/dict/ngerman (the «wngerman» package)

Running it with the english one:

$ raku combinations2 /usr/share/dict/british-english a b b a
a
b
ba
ba
baa

The duplicate «ba» is caused by the outer «for» loop. The first one comes from the «(a b)» candidate (using the first two letters in the argument), and the second from the «(a b)» candidate (using the third and fourth letters). We can avoid that by ensuring that the «%seen» code handles them by making sure they are equal. Adding «sort» does that:

  my $word = $candidate.sort.join;

The original «combinations» program had this problem as well, and has been fixed.

The complete program:

File: combinations2
unit sub MAIN ($word-file where $word-file.IO.r,
               *@letters where @letters.elems >= 1);

my %words is SetHash;

%words{$_.lc} = True for $word-file.IO.lines();

my $max-length = %words.keys>>.chars.max;

my %seen;

for @letters.combinations: 1 .. $max-length -> $candidate
{
  my $word = $candidate.sort.join;
  next if %seen{$word};
  %seen{$word} = True;
  
  for $word.comb.permutations.map(*.join).sort.unique -> $possible
  {
    say $possible if %words{$possible};
  }
}