Substrings
and Queues
with Raku

by Arne Sommer

Substrings and Queues with Raku

[23] Published 27. June 2019.

Perl 6 → Raku

This article has been moved from «perl6.eu» and updated to reflect the language rename in 2019.

This is my response to the Perl Weekly Challenge #18.

Challenge #18.1

Write a script that takes 2 or more strings as command line parameters and print the longest common substring. For example, the longest common substring of the strings “ABABC”, “BABCA” and “ABCBA” is string “ABC” of length 3. Other common substrings are “A”, “AB”, “B”, “BA”, “BC” and “C”.

Please check this wiki page for details.

The wikipedia article presents a solution in the Pseudocode section, but it applies to a situation with two strings only. Extending it to work with morestrings is complicated (as the algorithm itself is complicated to understand as it is, even without trying to extend it).

So I'll have a try at a completely different approach: compute all the substrings for each string, and take the intersection of them. And finally print the largest remaining substring.

First the procedure that gives all the substrings (with substr) of a given string:

File: lcs (partial)
sub substrings ($string)
{
  my %substrings;
  
  for ^$string.chars -> $i
  {
    for 1 .. $string.chars - $i -> $j
    {
      # say $string.substr($i, $j) ~ "\t($i,$j)";
      %substrings{ $string.substr($i, $j) } = True;
    }
  }
  return %substrings;
}

See docs.raku.org/routine/substr for more information about substr.

Uncomment the «say» line to see that it does what it should. (The numbers in the parens are the start index and the number of characters):

> substrings("1233456");
1	(0,1)
12	(0,2)
123	(0,3)
1233	(0,4)
12334	(0,5)
123345	(0,6)
1233456	(0,7)
2	(1,1)
23	(1,2)
233	(1,3)
2334	(1,4)
23345	(1,5)
233456	(1,6)
3	(2,1)
33	(2,2)
334	(2,3)
3345	(2,4)
33456	(2,5)
3	(3,1)
34	(3,2)
345	(3,3)
3456	(3,4)
4	(4,1)
45	(4,2)
456	(4,3)
5	(5,1)
56	(5,2)
6	(6,1)

Note that the procedure returns the substrings in a hash, so any duplicates are gone.

And now the main program, which is surprisingly short:

File: lcs (partial)
sub MAIN (*@strings where @strings.elems >= 2)                       # [1]
{
  my %common = substrings( @strings.shift );                         # [2]

  %common = %common ∩ substrings($_) for @strings;                   # [3]

  .say for %common.keys.grep({ .chars == %common.keys>>.chars.max }); # [4]
}

[1] A «MAIN» wrapper. It requires two or more arguments, and they are slurped into an array.

[2] We start with all the substrings of the first string.

[3] Then we loop through the rest of the strings, obtain the substrings and get the intersection (the character, or you can use the ASCII version (&)) of the two sets (the elements common to both sets). The result is a set of common substrings, i.e. substrings that are present in all of the original strings.

[4] We want the largest common substring, so we get the length of the largest one (with %common.keys>>.chars.max), and print the strings with that length. There can be more than one.

See docs.raku.org/routine/(&), infix ∩ for more information about the intersection operator and (&).

The hyper method call operator >>. applies the method on the right on each element in the list on the left, and returns a list. See docs.raku.org/language/operators#index-entry-methodop_>>. for more information.

Running it:

$ perl6 lcs 1233456 12eeeeeeeeeeeeeeeeeeeeeeee
12

$ perl6 lcs 1233456 12eeeeee56
12
56

$ perl6 lcs aa1bb bb1cc cc1aa
1

$ perl6 lcs ABABC BABCA ABCBA
ABC

The last on is the example given in the challenge.

The complete program:

File: lcs
sub MAIN (*@strings where @strings.elems >= 2)
{
  my %common = substrings( @strings.shift );

  %common = %common ∩ substrings($_) for @strings;

  .say for %common.keys.grep({ .chars == %common.keys>>.chars.max });
}


sub substrings ($string)
{
  my %substrings;
  
  for ^$string.chars -> $i
  {
    for 1 .. $string.chars - $i -> $j
    {
      %substrings{ $string.substr($i, $j) } = True;
    }
  }
  return %substrings;
}

The interesction operator works on Sets, which is a realative to a hash but the values can only be True (either the key is in the Set, or it isn't). A hash, as I give it, is simply coerced to a Set by the operator, and it works out.

See the documentation's introductory page to Sets, bags, and mixes for more information about Sets and the other special types available.

Challenge #18.2

Write a script to implement Priority Queue. It is like regular queue except each element has a priority associated with it. In a priority queue, an element with high priority is served before an element with low priority.

Please check this wiki page for more informations.

It should serve the following operations:

  1. is_empty: check whether the queue has no elements.
  2. insert_with_priority: add an element to the queue with an associated priority.
  3. pull_highest_priority_element: remove the element from the queue that has the highest priority, and return it. If two elements have the same priority, then return element added first.

I have chosen to implement the Queue as a class (and module). I'll present the program using it first:

File: priority-queue
use lib "lib";                                        # [1]

use PriorityQueue;                                    # [1]

my $PQ = PriorityQueue.new;                           # [2]

$PQ.insert_with_priority("AA$_", (^99).pick) for ^10; # [3]

while ! $PQ.is_empty                                  # [4]
{
  say $PQ.pull_highest_priority_element;              # [5]
}

[1] Load the module, which is placed in the «lib» subdirectory.

[2] Set up a queue object.

[3] A loop that inserts 10 thingys into the queue. The thingys are plain strings («AA0» to «AA9»), and they are given a random priority (between 0 and 98) by pick.

[4] While there are elements in the queue,

[5] • retrieve the one with the highest priority, and print the name of the thingy.

See docs.raku.org/routine/pick for more information about «pick».

The wikipedia page describes different approaches to the code in the module itself (naive impementations, usual implementations, and specialized heaps). But Raku has a lot of power, so we can get away with a rather minimalistic piece of code - and still have good performance. I have chosen to use a hash:

File: lib/PriorityQueue.rakumod
unit class PriorityQueue;    # [1]

has %!queue;                 # [2]

method is_empty              # [3]
{
  return ! %!queue.elems;
}

method insert_with_priority ($payload, Int $priority)  # [3] [4]
{
  say ":: $payload ($priority)";                       # [5]

  %!queue{$priority}.push: $payload;                   # [6]
}

method pull_highest_priority_element                            # [3]
{
  my $priority = %!queue.keys.max;                              # [7]

  my $element = @(%!queue{$priority}).shift;                    # [8]

  %!queue{$priority}:delete unless @(%!queue{$priority}).elems; # [9]

  return $element;                                              # [10]
}

[1] The class name.

[2] The class has one element, the queue itself, which is a hash.

[3] The class has the three methods described in the challenge.

[4] I have chosen to require the priority to be an integer.

[5] Debug output, useful to check that it works whilst developing or testing.

[6] Add the new element to the queue. The hash key is the priority, and the new element is added to the old value which is a list (with «push»). The value is thus a list of elements with the same priority with the latest addition at the end.

[7] Get the highest priority in the queue.

[8] Get the first element with that priority (from the list we get by using the priority as key in the queue hash).

[9] Remove the element (the list) from the hash if the list is empty (if we removed the only item in the list), as it isn't removed automatically.

[10] Return the element.

Running it:

$ perl6 priority-queue
:: AA0 (36)
:: AA1 (47)
:: AA2 (65)
:: AA3 (98)
:: AA4 (25)
:: AA5 (36)
:: AA6 (32)
:: AA7 (24)
:: AA8 (56)
:: AA9 (20)
AA3
AA2
AA8
AA1
AA0
AA5
AA6
AA4
AA7
AA9

The elements are printed in correct order.

Running it again until we get two elements with the same priority (82), to see that it works as intended:

./priority-queue
:: AA0 (28)
:: AA1 (41)
:: AA2 (1)
:: AA3 (82)
:: AA4 (56)
:: AA5 (18)
:: AA6 (10)
:: AA7 (94)
:: AA8 (12)
:: AA9 (82)
AA7
AA3
AA9
AA4
AA1
AA0
AA5
AA8
AA6
AA2

And that's it.