Tuesday, December 22, 2020

An initial exploration of superpermutations of multisets

 A superpermutation of a set is a sequence of n distinct symbols which contains all n! permutations of those n symbols as substrings. A trivial superpermutation just lists all possible permutations one after another, for a total length of n*n!. However, considerably smaller superpermutations can be achieved by overlapping permutations with common prefixes and suffixes.

Without loss of generality, we can label all of the elements of a finite set with integers, and work exclusively with sets of consecutive integers. Each set, and its corresponding set of superpermutations, can then be parameterized with a single number--its size, or cardinality.

The unique superpermutation of the set of size 1 is "1".

The superpermutations of the set of size 2 are "121" and "212" (which are equivalent up to relabelling).

And minimal-length superpermutations are known for sets of up to size 5.

But what if, instead of working with proper sets, in which each symbol must occur exactly once, we decided to look at permutations of multisets instead?

Multisets allow each element to occur any number of times, known as the multiplicity of the element. The (equivalence classes of) multisets are no longer characterized by a single number; instead, we can talk about the support cardinality (the number of different kinds of elements it has), the multiplicities of each of those elements, and the multiset cardinality, which is the sum of the multiplicities. All of that information can be encoded as an ordered list of multiplicities (which, for the sake of consistency and without loss of generality, we can assert must be non-decreasing). Thus, the multiset { 1, 1, 1 } can be represented by the list [3], and the multiset { 1, 2, 2 } can be represented by the list [1, 2].

This provides several different dimensions along which to extend the size of a multiset, with different implications for their effect on the superpermutation.

Multisets with all multiplicities equal to 1 are equivalent to sets of the same cardinality, and have the same superpermutations. Due to the repeated elements, however, multisets with multiplicities greater than one have fewer distinct permutations than sets of the same cardinality, and thus have shorter minimal superpermutations. In the trivial case of multisets with support cardinality 1, they have a single permutation (a list of their single element repeated as many times as its multiplicity), which is also the unique superpermutation--just like sets of cardinality 1.

Things get more interesting for multisets of the form [1, n], with support cardinality 2. These have minimal superpermutations of the form 2{n}12{n}--i.e., the second element repeated as many times as its multiplicity on either side of the first element. E.g., 

super([1, 1]) = 212, equivalent to the superpermutation of the set of cardinality 22
super([1, 2]) = 22122
super([1, 3]) = 2221222

etc.

Each individual permutation consists of inserting the first symbol somewhere in the list of repetitions of the second symbol, and can be obtained by sliding a window of length n+1 across the superpermutation, and the lengths of these superpermutations are trivial characterized by the formula L([1, n]) = 2n + 1.

If we move on to multisets of support cardinality 3, however, things suddenly get considerably more complicated. Adding only a single additional element with multiplicity 1 increases the number of permutations by a factor of n+1 (going from 3 for n=2 to 12=3(n+1) for n=3), as the additional element can be placed in every position in each of the previous permutations. Through exhaustive search, the minimal unique (i.e., with mirrored pairs and pairs which can be turned into each other via relabelling) superpermutations of [1, 1, 2] are

13321331233132313
33123313231332133

with a length of 17.

A conservative lower bound on the possible lengths of minimal superpermutations can be acquired by observing that the number of permutations achievable by sliding a window along the superpermutation must equal or exceed the number of distinct permutations of the multiset. The number of windows of length n over a string of length k, where n<=k, is simply k - n + 1. The permutations of a multiset S are given by the formula C(S)!/P(m! | m in M(S)) where C is the cardinality function, P is the product operator, and M is the multiplicity function; in other words, it is the factorial of the cardinality of S divided the product of the factorials of each of the multiplicities of S. Thus,

k >= C(S) + C(S)!/P(m! | m in M(S)) - 1

For the [1, n] multisets, this formula gives lower-bound lengths of 3, 5, and 7, respectively--so it is in fact a tight bound. For the multiset [1, 1, 2], however, this gives lower bound length of 4 + 4!/(1!*1!*2!) - 1 = 4 + 24/2 - 1 = 4 + 12 - 1 = 15--while the actual minimal length, discovered by exhaustive search, was 17. And in fact, it is easy to see that each of the superpermutations have 2 length-4 substrings which are not permutations of the multiset, accounting for the extra length. This is unsurprising, since it is already well known that this is not a tight lower bound for superpermutations of sets, and multisets are a strict superset of sets.

There are a few ways in which we could try to extend this result--increase the value of n again, to create a parameterized family of [1, 1, n] multisets; extend the number of singleton elements, to create a parameterized family of [1{n}, 2] multisets, with increasing support cardinality; or pivoting entirely to see what happens with [n, n] multisets, with support cardinality 2 and multiset cardinality 2n.

It turns out that increasing the number of unique elements increases the number of permutations much faster than increasing the multiplicity of an individual element. Trying to brute force [1, 1, 1, 2] (with 60 permutations) just causes my software to run out of memory and fall over and die. That could probably be fixed, but it makes it much easier to go in different directions for now.

The Family [1, 1, n]

Doing a complete brute-force search of the superpermutation space for n > 2 takes a very long time; however, some superpermutations closely approaching the lower bound can be found relatively quickly.

The multiset [1 1, 3] has a lower bound superpermutation length of 24. It appears to have optimal superpermutations of length 27 -- 3 more than the lower bound. An example is 332313323313233312333213332.

The multiset [1, 1, 4] has a lower bound superpermutation length of 35. It appears to have optimal superpermutations of length 39 -- 4 more than the lower bound. An example is 323331323333123333213333231333233133233.

The multiset [1, 1, 5] has a lower bound superpermutation length of 48. It appears to have optimal superpermutations of length 53 -- 5 more than the lower bound. An example is 33233313323333132333331233333213333323133332331333233.

Recalling that [1, 1, 2] had a minimal superpermutation length of 17 -- two more than its theoretical lower bound -- then assuming that this empirical pattern holds, this family appears to have optimal superpermutation lengths given by k = C(S) + C(S)!/P(m! | m in M(S)) + n - 1, which can be simplified to

k = 2n + (n+1)(n+2) + 1

The Family [n, n]

The multiset [2,2] has a total of 6 permutations, and a lower bound superpermutation length of 9. Its single unique superpermutation is

1122112121

which has a single length-4 substring which is not a permutation of the multiset, and a total length of 10.

As with the [1, 1, n] family, doing a complete brute-force search for n>2 takes a very long time, but we can find probable solutions fairly quickly.

The multiset [3, 3] has a lower bound superpermutation length of 25, and an apparent optimal superpermutation of length 29. An example is 12221112221211221212122112122.

The multiset [4, 4] has a lower bound superpermutation length of 77, and an apparent optimal superpermutation length of 117. And example is 222121112222111122221121222111212221121212211122122111222121121221211221212112221122112211211222121212122112122211211.

The multiset [5, 5] causes my search process to run out of memory and fall over. :(

The multiset [1, 1], equivalent to the set { 1, 2 } has a minimum bound of 3, and an actual minimal superpermutation length of 3.

The differences from the minimum bound formula are 0, 1, 4, and 20 for n = 1, 2, 3, and 4 respectively.

Unfortunately, there is a large number of matches for the sequence 0, 1, 4, 20... in The Online Encyclopedia of Integer Sequences... and none for 3, 10, 29, 117. So, without a more principled method construction for these superpermutations, I am unfortunately at a loss to properly characterize them.

No comments:

Post a Comment