Sexy Primes, LZW and Raku

by Arne Sommer

Sexy Primes, LZW and Raku

[28] Published 25. August 2019

Perl 6 → Raku

This article has been moved from «perl6.eu» and updated to reflect the language rename in 2019.

This is my response to the Perl Weekly Challenge #022.

Challenge #22.1

Write a script to print first 10 Sexy Prime Pairs. Sexy primes are prime numbers that differ from each other by 6. For example, the numbers 5 and 11 are both sexy primes, because 11 - 5 = 6. The term “sexy prime” is a pun stemming from the Latin word for six: sex. For more information, please checkout wiki page.

We need primes, and this is the easiest way of obtaining them, as an ordered sequence:

my $primes := (1 .. Inf).grep(*.is-prime);  # [1]

[1] Using the is-prime method to sort out the primes.

See docs.raku.org/routine/is-prime for more information about is-prime.

If you have followed the Perl Weekly Challenge (and especially my solutions) for some time, this is familiar. If not take a look at one of the following articles:

Now the actual challenge, which is quite easy given the Prime Sequence above:

File: sexy-prime-pairs
my $primes := (1 .. Inf).grep(*.is-prime);

my $sexy-primes := $primes.grep({ ($_ + 6).is-prime });               # [1]

say "{ $_ +1 }: ($sexy-primes[$_],{ $sexy-primes[$_] +6 })" for ^10;  # [2]

[1] We start with the primes, and use «grep» to select only those where the value + 6 also is a prime.

[1] The sequence gives the first number in the pairs, so we have to do the rest. manually.

Running it gives the same result as the wiki page:

$ raku sexy-prime-pairs
1: (5,11)
2: (7,13)
3: (11,17)
4: (13,19)
5: (17,23)
6: (23,29)
7: (31,37)
8: (37,43)
9: (41,47)
10: (47,53)

It is possible to combine the two sequences:

File: sexy-prime-pairs2
my $sexy-primes := (1 .. Inf).grep({ .is-prime && ($_ + 6).is-prime });

say "{ $_ + 1 }: ($sexy-primes[$_],{ $sexy-primes[$_] + 6 })" for ^10;

The result is more compact, but not quite as readable. The output is the same.

Beyond 10

The challenge asked for the first ten pairs, but we can allow an optional integer argument, which is the number of pairs to display. With 10 as the default, to satisfy the challenge:

File: sexy-prime-pairs3
unit sub MAIN (Int $count = 10);

my $sexy-primes := (1 .. Inf).grep({ .is-prime && ($_ + 6).is-prime });

say "{ $_ + 1 }: ($sexy-primes[$_],{ $sexy-primes[$_] + 6 })" for ^$count;

Fixing the Sequence

The sequence gave the first number in the pairs, and not the pairs. This may not be a problem, as we are quite capable of computing the other value. But the sequence doesn't deliver what the name implies that it does, and would be annoying if supplied like this in a module.

It is easy to get the seqence to return both values in the pairs:

File: sexy-prime-pairs4
unit sub MAIN (Int $count = 10);

my $sexy-prime-pairs := (1 .. Inf).grep({ .is-prime && ($_ + 6).is-prime }) \
                                  .map({( $_, $_ + 6 )});

say "{ $_ + 1 }: $sexy-prime-pairs[$_]" for ^$count;

All it took was a map that returns the values as a list.

See docs.raku.org/routine/map for more information about map.

The output isn't exactly the same, though the values are:

1: 5 11
2: 7 13
3: 11 17
4: 13 19
5: 17 23
6: 23 29
7: 31 37
8: 37 43
9: 41 47
10: 47 53

It is easy to fix the output:

File: sexy-prime-pairs5 (changes only)
say "{ $_ + 1 }: ({ $sexy-prime-pairs[$_].join(",") })" for ^$count;

Bonus 1: Sexy prime triplets

The wikipedia article also describes «Sexy prime triplets», which it defines as «triplets of primes (p, p + 6, p + 12) such that p + 18 is composite. A Composite number is defined like this: «A composite number n is a positive integer n>1 which is not prime (by mathworld.wolfram.com).

File: sexy-prime-triplets
unit sub MAIN (Int $count = 10);

my $sexy-prime-triplets := (1 .. Inf).grep({ .is-prime && ($_ + 6).is-prime \
  && ($_ + 12).is-prime && ! ($_ + 18).is-prime });

say "{ $_ + 1 }: ($sexy-prime-triplets[$_],{ $sexy-prime-triplets[$_] + 6 }, \
     { $sexy-prime-triplets[$_] + 12 })" for ^$count;

Running it gives the same result as the wiki page:

$ raku sexy-prime-triplets
1: (7,13,19)
2: (17,23,29)
3: (31,37,43)
4: (47,53,59)
5: (67,73,79)
6: (97,103,109)
7: (101,107,113)
8: (151,157,163)
9: (167,173,179)
10: (227,233,239)

We should perhaps fix the sequence here as well:

File: sexy-prime-triplets2
unit sub MAIN (Int $count = 10);

my $sexy-prime-triplets := (1 .. Inf).grep({ .is-prime && ($_ + 6).is-prime &&
 ($_ + 12).is-prime && ! ($_ + 18).is-prime }).map({( $_, $_ + 6, $_ + 12 )});

say "{ $_ + 1 }: ({ $sexy-prime-triplets[$_].join(",") })" for ^$count;

Bonus 2: Sexy prime quadruplets

The wikipedia article also describes «Sexy prime quadruplets», which it defines as «Sexy prime quadruplets (p, p + 6, p + 12, p + 18)». The difference from «Sexy prime triplets» is that the last part (p + 18) now must be a prime, and that the value is part of the result.

File: sexy-prime-quadruplets
unit sub MAIN (Int $count = 10);

my $sexy-prime-quadruplets := (1 .. Inf).grep({ .is-prime && ($_ + 6).is-prime
  && ($_ + 12).is-prime && ($_ + 18).is-prime });

say "{ $_ + 1 }: ($sexy-prime-quadruplets[$_], \
  { $sexy-prime-quadruplets[$_] + 6 }, \
  { $sexy-prime-quadruplets[$_] + 12 }, \
  { $sexy-prime-quadruplets[$_] + 18 })"
    for ^$count;

Running it gives the same result as the wiki page:

1: (5,11,17,23)
2: (11,17,23,29)
3: (41,47,53,59)
4: (61,67,73,79)
5: (251,257,263,269)
6: (601,607,613,619)
7: (641,647,653,659)
8: (1091,1097,1103,1109)
9: (1481,1487,1493,1499)
10: (1601,1607,1613,1619)

Here it is with the sequence fix applied:

File: sexy-prime-quadruplets2
unit sub MAIN (Int $count = 10);

my $sexy-prime-quadruplets := (1 .. Inf).grep({ .is-prime && ($_ + 6).is-prime
  && ($_ + 12).is-prime && ($_ + 18).is-prime }).map({( $_, $_ + 6, $_ + 12,
      $_ + 18)});

say "{ $_ + 1 }: ({ $sexy-prime-quadruplets[$_].join(",") })" for ^$count;

Bonus 3: Sexy prime ququintuplets

This is actually easy, as there is only 1 ququintuplet: (5,11,17,23,29).

File: sexy-prime-ququintuplets
say "(5,11,17,23,29)";

The reason is quite clever: By adding 6 to any number, in a loop 5 times, one of the values must end with 5 or 0. (As we add 5 + 1, and do it 5 times.) A value ending with the digit 0 is not a prime (as it is either zero or divisible by e.g. 10), and if it ends with 5 it is divisible by 5 - unless it is the number 5 itself, which is a prime. So 5 must be in the ququintuplet, and it has to be the first value.

Challenge #22.2. Sexy Prime Pairs

Write a script to implement Lempel–Ziv–Welch (LZW) compression algorithm. The script should have method to encode/decode algorithm. The wiki page explains the compression algorithm very nicely.

There is two ways to do this:

  • Hard code the legal alphabet in the algorithms (both the encoder and decoder)
  • Deduce the alphabet from the input string, and find a way to forward it to the decoder
I'll do the second one. (But it is easy to rewrite it to do the first one, and I'll get back to that later.)

I have written «encode» and «decode» procedures, but will show the program MAIN wrappers first:

File: lzw (partial)
multi sub MAIN (Str $string)                   # [1]
{
  my ($alphabet, $encoded) = encode($string);  # [2]
  say "'" ~ $alphabet ~ "' $encoded" ;         # [2]
}

multi sub MAIN (Str $alphabet, Str $binary)    # [3]
{
  say decode($alphabet, $binary);              # [4]
}

[1] If we specify a single string as argument to the program, it encodes it.

[2] The encoder returns the alphabet and the encoded string (as a binary number).

[3] If we specify a string and a binary number as argument to the program, it decodes it.

[4] The decoder returns the original string, if we pass it the alphabet and the binary number (with any leading zeroes) we got from the encoder.

The encoder looks like this:

File: lzw (partial)
sub encode (Str $string)                              # [1]
{
  my @alphabet =  ('#', $string.comb).flat.unique;    # [2]
  my @dict     =  @alphabet;                          # [3]
  my %dict     =  @alphabet.kv.reverse;               # [4]

  my @chars = $string.comb;                           # [5]
  my $binary = "";                                    # [6]
   
  while @chars                                        # [7]
  {
    my $char = @chars.shift;                          # [8]

    last if $char eq "#";                             # [9]

    $char ~= @chars.shift while @chars.elems && %dict{$char ~ @chars[0]}; # [10]

    my $bits = @dict.elems.base(2).chars;             # [11]
    my $fmt  = '%0' ~ $bits ~ 'b';                    # [12]
    my $add  = %dict{$char}.fmt($fmt);                # [13]
    $binary ~= $add;                                  # [14]

    if @chars.elems && @chars[0] ne "#"               # [15]
    {
      @dict.push($char ~ @chars[0]);                  # [16]
      %dict{$char ~ @chars[0]} = @dict.end;           # [17]
    }
  }
  
  return @alphabet.join, $binary;                     # [18]
}

[1] Starting with the input string.

[2] The alphabet (as an array) is the unique characters in the input string. Ensure that the "#" character (the stop character) is the first element in the array.

[3] A copy, where we add the multi letter sequences later on.

[4] A reverse lookup. Use the letter (or multi letter string) as lookup, and get the index - which is the value to emit. «kv» on an array gives the keys and values as pairs. Applying «reverse» on it swaps the keys and values, so the resulting hash has the old values as keys, and the keys as values.

[5] The input string as an array of single character, as it is more work to use substrings.

[6] The output, as a sequence of binary digits.

[7] As long as we have any more characters in the input;

[8] • get the first one.

[9] • exit the loop if the character is "#" (the stop character).

[10] Add the next character in the string, as long as the resulting string is present in the dictionary. (Note that «$char» starts with a single character, but can hold several as a result of this while loop.)

[11] Get the number of bits (binary digits) needed to encode the current character,

[12] and use that value as argument to the «fmt» method to get the correct number of digits, with zeros added at the beginning if required.

[13] Get the decimal value from the dictionary, convert it to binary,

[14] and add it to the end of the binary number (or rather, string).

[15] If we have more elements in the input (and it isn't the stop character),

[16] add the string with the first unused character to the dictionary,

[17] and the reverse lookup.

[18] Return the alphabet and the binary string.

Note that this algorithm works with or without using the stop character to end the string.

It is tempting to convert the binary number to decimal and output that one - as it is shorter. But that would play havoc with the decoder, as any leading zeroes in the encoded number would be missing (as leading zeroes are not really a thing in numbers). And they are vital for us, as we count digits from the left. Missing leading zeroes will break the decoder. (Flipping the number before converting it to decimal in the encoder, and again after converting to binary in the decoder would fix the leading zero problem, but would introduce a trailing zero problem instead.) So if we were to convert this binary number in any way, it would have to keep any leading zeroes. One way could be to grab 8 bits at a time, and convert them to an 8-bit number (in the range 0-255).

The decoder is shorter:

File: lzw (partial)
sub decode (Str $alphabet, Str $binary is copy)                         # [1]
{
  my @alphabet = $alphabet.comb;                                        # [2]
  
  @alphabet.unshift('#') unless @alphabet[0] eq "#";                    # [3]

  my $prev;                                                             # [4]
  my $output;                                                           # [5]

  while $binary.chars                                                   # [6]
  {
    my $size = (@alphabet.elems +1).base(2).chars;                      # [7]
    my $current = $binary.substr(0, $size);                             # [8]
    $binary.substr-rw(0, $size) = "";                                   # [9]

    my $value = @alphabet[$current.parse-base(2)];                      # [10]
    @alphabet.push($prev ~ $value) if $prev.defined;                    # [11]
    
    $output ~= $value;                                                  # [12]
    $prev = $value;                                                     # [13]
  }

  return $output;                                                       # [14]
}

[1] We start with the alphabet, and the binary string.

[2] Get the alphabet as an array.

[3] Add «#» as the first element in the alphabet, if not there already. (The encoder places it here for us, so this isn't really necessary.)

[4] The value in the previous iteration, initially empty.

[5] The output string.

[6] While we have binary characters left in the input;

[7] • get the number of binary digits for each character (or character string). This is the number of elemens in the alphabet +1 (to compensate for the fact that we do the computation before adding the value to the alphabet).

[8] • get the correct number of binary digits,

[9] • and remove them from the input.

[10] • convert the binary number to decimal, and get the character (or string) by looking it up in the alphabet (with this number as index).

[11] If we have a previous value (one that is in the alphabet), add the next one.

[12] Add the new value (one or more characters) to the end of the result.

[13] Set up the previous value to hold the current value; ready for the next iteration.

[14] Return the string.

Running it to see that it round trips:

$ raku lzw aslakaslakQQl
'#aslkQ' 0010100011000101000110100001000101010100011

$ raku lzw '#aslkQ' 0010100011000101000110100001000101010100011
aslakaslakQ

Note that you have to quote the alphabet string, so that the shell doesn't choke on the # character. The encoder prints quotes around it to be nice.

Verbose

It may be easier to follow the code if we add some debug output. I have done so, in a version of the program called «lzw-verbose» (which is included in the zip file).

Running the encoder with the verbose flag:

$ raku lzw-verbose --verbose aslakaslakQQl
Dict: # <=> 0 (alphabet)
Dict: a <=> 1 (alphabet)
Dict: s <=> 2 (alphabet)
Dict: l <=> 3 (alphabet)
Dict: k <=> 4 (alphabet)
Dict: Q <=> 5 (alphabet)
Out: a ->  1 -> 001
Dict: as <=> 6
Out: s ->  2 -> 010
Dict: sl <=> 7
Out: l ->  3 -> 0011
Dict: la <=> 8
Out: a ->  1 -> 0001
Dict: ak <=> 9
Out: k ->  4 -> 0100
Dict: ka <=> 10
Out: as ->  6 -> 0110
Dict: asl <=> 11
Out: la ->  8 -> 1000
Dict: lak <=> 12
Out: k ->  4 -> 0100
Dict: kQ <=> 13
Out: Q ->  5 -> 0101
Dict: QQ <=> 14
Out: Q ->  5 -> 0101
Dict: Ql <=> 15
Out: l ->  3 -> 00011
'#aslkQ' 0010100011000101000110100001000101010100011

We can see that 3 bits is enough to encode the initial alphabet. When «sl» has been added it switches to 4 bits, and after «Ql» we get 5 bits.

Running the decoder with the verbose flag:

$ raku lzw-verbose --verbose "#aslkQ" 0010100011000101000110100001000101010100011
Dict: # <=> 0 (alphabet)
Dict: a <=> 1 (alphabet)
Dict: s <=> 2 (alphabet)
Dict: l <=> 3 (alphabet)
Dict: k <=> 4 (alphabet)
Dict: Q <=> 5 (alphabet)
In: 001 -> 1 -> a
Dict: as <=> 6
In: 010 -> 2 -> s
Dict: sl <=> 7
In: 0011 -> 3 -> l
Dict: la <=> 8
In: 0001 -> 1 -> a
Dict: ak <=> 9
In: 0100 -> 4 -> k
Dict: kas <=> 10
In: 0110 -> 6 -> as
Dict: asla <=> 11
In: 1000 -> 8 -> la
Dict: lak <=> 12
In: 0100 -> 4 -> k
Dict: kQ <=> 13
In: 0101 -> 5 -> Q
Dict: QQ <=> 14
In: 0101 -> 5 -> Q
Dict: Ql <=> 15
In: 00011 -> 3 -> l
aslakaslakQQl

Fixed Alphabet

Using a fixed (or hard coded) alphabet is quite easy. Define the alphabet globally in the file:

my @alphabet = ("#", "A" ... "Z", "a" ... "z", 0 .. 9).flat;

Then remove the «$alphabet» variables in the MAIN wrappers:

multi sub MAIN (Str $string, :$verbose)
{
  say encode($string, :$verbose);
}

multi sub MAIN Str $binary, :$verbose)
{
  say decode($binary, :$verbose);
}

Then remove the first line in «encode» (the declaration of «@alphabet») and change the last line to this:

  return $binary;

«decode» is a little harder. Change the top (above the «if $verbose» line) to:

sub decode (Str $binary is copy, :$verbose)
{
  my @dict = @alphabet

Then rename «@alphabet» to «@dict» in the rest of the procedure.

And that's it.