Python – How to Find all Substring Combinations in a String
Wednesday, December 12th, 2007I got asked this question recently. Basically how many combinations can you make out of a string like “abcd”. For example the answer for “abcd” is:
a b c d ab ac ad bd cb cd abd acb acd cbd acbd
I believe this is basically the same question as find all subsets in a set. Please correct me if I’m wrong on that.
Here’s the code I finally came up with for my first attempt:
"""
Enumerate all subsets of characters in a string
"""
#Make a convenient sorter we'll user later
def len_alpha_sorter(val1,val2):
"""#sort by len then by alpha"""
if len(val1)==len(val2):
return cmp(val1,val2)
else:
return cmp(len(val1),len(val2))
def cross_lists_of_sets(lst1,lst2):
"""return list of unions for every combination of sets
across the input lists. We get duplicates here
which we'll handle later. This should be fixed later too."""
cplst1,cplst2=lst1[:],lst2[:]
retlst=[]
for set1 in cplst1:
for set2 in cplst2:
retlst.append(set1.union(set2))
return retlst
def get_all_subsets(set_val):
"""
Main function to get the subsets. See comments for details.
"""
#temp is a container to hold lists of sets, keyed by the length
#of the sets it will hold.
temp={}
#get individual characters
temp[1]=[set([v]) for v in set_val]
#cross list of sets for previous size with sets for individual
#characters to get a list of sets for the next size.
for i in range(2,len(set_val)):
temp[i]=cross_lists_of_sets(temp[i-1],temp[1])
#add in full string as a subset
temp[len(set_val)]=set([set_val])
#Put each list in temp into one big list instead
all_sets_w_dupes=[]
[all_sets_w_dupes.extend(v) for v in temp.values()]
#We will have duplicates at this point, let's clean up, by
#making the sets into frozensets, taking the set of that list
#to get unique elements only, sorting it nicely, and finally
#putting it together into a string.
return '\n'.join(
sorted(
set([''.join(frozenset(s)) for s in all_sets_w_dupes]),
len_alpha_sorter)
)
Can you A.) Figure out the complexity of this beast? and B.) Find an algorithm with lower complexity? The current algorightm must be mamoth since this code ran overnight on a string with less than 15 characters!