Найти уникальные пары в списке пар

У меня есть (большой) список списков целых чисел, например,

a = [
    [1, 2],
    [3, 6],
    [2, 1],
    [3, 5],
    [3, 6]
    ]

Большинство пар появятся дважды, где порядок целых чисел не имеет значения (т.е. [1, 2] эквивалентен [2, 1]). Теперь я хотел бы найти пары, которые появляются только один раз, и получить список Boolean, указывающий на это. Для приведенного выше примера

b = [False, False, False, True, False]

Так как a обычно велико, я бы хотел избежать явных циклов. Сопоставление с frozenset может быть рекомендовано, но я не уверен, что это перебор.

Ответы

Ответ 1

Здесь решение с NumPy, которое в 10 раз быстрее предлагаемого решения frozenset:

a = numpy.array(a)
a.sort(axis=1)
b = numpy.ascontiguousarray(a).view(
        numpy.dtype((numpy.void, a.dtype.itemsize * a.shape[1]))
        )
_, inv, ct = numpy.unique(b, return_inverse=True, return_counts=True)
print(ct[inv] == 1)

Сортировка выполняется быстро и гарантирует, что ребра [i, j], [j, i] в исходном массиве идентифицируются друг с другом. Гораздо быстрее, чем frozenset или tuple s.
Неопределенность строк, вдохновленная fooobar.com/questions/46697/....

Сравнение скорости для разных размеров массива:

Сюжет был создан с помощью

from collections import Counter
import numpy
import perfplot


def fs(a):
    ctr = Counter(frozenset(x) for x in a)
    b = [ctr[frozenset(x)] == 1 for x in a]
    return b


def with_numpy(a):
    a = numpy.array(a)
    a.sort(axis=1)
    b = numpy.ascontiguousarray(a).view(
            numpy.dtype((numpy.void, a.dtype.itemsize * a.shape[1]))
            )
    _, inv, ct = numpy.unique(b, return_inverse=True, return_counts=True)
    res = ct[inv] == 1
    return res


perfplot.show(
        setup=lambda n: numpy.random.randint(0, 10, size=(n, 2)),
        kernels=[fs, with_numpy],
        labels=['frozenset', 'numpy'],
        n_range=[2**k for k in range(12)],
        xlabel='len(a)',
        logx=True,
        logy=True,
        )

Ответ 2

ctr = Counter(frozenset(x) for x in a)
b = [ctr[frozenset(x)] == 1 for x in a]

Мы можем использовать счетчик, чтобы получить подсчеты каждого списка (перевернуть список на frozenset, чтобы игнорировать порядок), а затем для каждого списка проверьте, отображается ли он только один раз.

Ответ 3

Вы можете отсканировать список от начала до конца, сохранив при этом map встречающихся пар в свою первую позицию. Всякий раз, когда вы обрабатываете пару, вы проверяете, встречались ли вы раньше. Если в этом случае, как первый индекс столкновения в b, так и текущий индекс столкновений должны быть установлены на False. В противном случае мы просто добавим текущий индекс к карте встречающихся пар и ничего не изменим о b. b начнет сначала все True. Чтобы сохранить что-то эквивалентное wrt [1,2] и [2,1], я сначала просто сортировал пару, чтобы получить стабильное представление. Код будет выглядеть примерно так:

def proc(a):
  b = [True] * len(a) # Better way to allocate this
  filter = {}
  idx = 0
  for p in a:
    m = min(p)
    M = max(p)
    pp = (m, M)
    if pp in filter:
      # We've found the element once previously
      # Need to mark both it and the current value as "False"
      # If we encounter pp multiple times, we'll set the initial
      # value to False multiple times, but that not an issue
      b[filter[pp]] = False
      b[idx] = False
    else:
      # This is the first time we encounter pp, so we just add it
      # to the filter for possible later encounters, but don't affect
      # b at all.
      filter[pp] = idx
    idx++
  return b

Сложность времени O(len(a)), которая хороша, но сложность пространства также O(len(a)) (для filter), поэтому это может быть не так велико. В зависимости от того, насколько вы гибки, вы можете использовать приблизительный фильтр, такой как фильтр Bloom.

Ответ 4

#-*- coding : utf-8 -*-
a = [[1, 2], [3, 6], [2, 1], [3, 5], [3, 6]]
result = filter(lambda el:(a.count([el[0],el[1]]) + a.count([el[1],el[0]]) == 1),a)
bool_res = [ (a.count([el[0],el[1]]) + a.count([el[1],el[0]]) == 1) for el in a]
print result
print bool_res

который дает:

[[3, 5]]
[False, False, False, True, False]

Ответ 5

Используйте словарь для решения O (n).

a = [ [1, 2], [3, 6], [2, 1], [3, 5], [3, 6] ]

dict = {}
boolList = []

# Iterate through a
for i in range (len(a)):

    # Assume that this element is not a duplicate
    # This 'True' is added to the corresponding index i of boolList
    boolList += [True]

    # Set elem to the current pair in the list
    elem = a[i]

    # If elem is in ascending order, it will be entered into the map as is
    if elem[0] <= elem[1]:
        key = repr(elem)
    # If not, change it into ascending order so keys can easily be compared
    else:
        key = repr( [ elem[1] ] + [ elem[0] ])

    # If this pair has not yet been seen, add it as a key to the dictionary
    # with the value a list containing its index in a.
    if key not in dict:
        dict[key] = [i]
    # If this pair is a duploicate, add the new index to the dict. The value
    # of the key will contain a list containing the indeces of that pair in a.
    else:
        # Change the value to contain the new index
        dict[key] += [i]

        # Change boolList for this to True for this index
        boolList[i] = False

        # If this is the first duplicate for the pair, make the first
        # occurrence of the pair into a duplicate as well.
        if len(dict[key]) <= 2:
            boolList[ dict[key][0] ] = False

print a
print boolList