Python - найти элемент с максимальными вхождениями в список

В Python у меня есть список:

L = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]

Я хочу идентифицировать элемент, который имел наибольшее количество раз. Я могу решить это, но мне нужен самый быстрый способ сделать это. Я знаю, что есть хороший ответ на Питону.

Ответы

Ответ 1

Вот решение defaultdict, которое будет работать с версиями Python версии 2.5 и выше:

from collections import defaultdict

L = [1,2,45,55,5,4,4,4,4,4,4,5456,56,6,7,67]
d = defaultdict(int)
for i in L:
    d[i] += 1
result = max(d.iteritems(), key=lambda x: x[1])
print result
# (4, 6)
# The number 4 occurs 6 times

Обратите внимание, если L = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 7, 7, 7, 7, 7, 56, 6, 7, 67] то есть шесть 4 и шесть 7. Однако результатом будет (4, 6) то есть шесть 4s.

Ответ 2

from collections import Counter
most_common,num_most_common = Counter(L).most_common(1)[0] # 4, 6 times

Для более старых версий Python (< 2.7) вы можете использовать этот рецепт, чтобы получить Counter.

Ответ 3

Я удивлен, что никто не упомянул простейшее решение max() с ключом list.count:

max(lst,key=lst.count)

Пример:

>>> lst = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
>>> max(lst,key=lst.count)
4

Это работает в Python 3 или 2, но обратите внимание, что он возвращает только самый частый элемент, а не частоту. Кроме того, в случае розыгрыша (т.е. Наиболее часто встречающегося элемента) возвращается только один элемент.

Хотя временная сложность использования max() хуже, чем использование Counter.most_common(1) качестве комментариев PM 2Ring, этот подход выигрывает от быстрой реализации C и я считаю, что этот подход наиболее быстр для коротких списков, но медленнее для более крупных (Python 3.6 время показанное в IPython 5.3):

In [1]: from collections import Counter
   ...: 
   ...: def f1(lst):
   ...:     return max(lst, key = lst.count)
   ...: 
   ...: def f2(lst):
   ...:     return Counter(lst).most_common(1)
   ...: 
   ...: lst0 = [1,2,3,4,3]
   ...: lst1 = lst0[:] * 100
   ...: 

In [2]: %timeit -n 10 f1(lst0)
10 loops, best of 3: 3.32 us per loop

In [3]: %timeit -n 10 f2(lst0)
10 loops, best of 3: 26 us per loop

In [4]: %timeit -n 10 f1(lst1)
10 loops, best of 3: 4.04 ms per loop

In [5]: %timeit -n 10 f2(lst1)
10 loops, best of 3: 75.6 us per loop

Ответ 4

В вашем вопросе вы попросили максимально быстрый способ сделать это. Как неоднократно демонстрировалось, особенно с Python, интуиция не является надежным руководством: вам нужно измерить.

Вот простой тест нескольких различных реализаций:

import sys
from collections import Counter, defaultdict
from itertools import groupby
from operator import itemgetter
from timeit import timeit

L = [1,2,45,55,5,4,4,4,4,4,4,5456,56,6,7,67]

def max_occurrences_1a(seq=L):
    "dict iteritems"
    c = dict()
    for item in seq:
        c[item] = c.get(item, 0) + 1
    return max(c.iteritems(), key=itemgetter(1))

def max_occurrences_1b(seq=L):
    "dict items"
    c = dict()
    for item in seq:
        c[item] = c.get(item, 0) + 1
    return max(c.items(), key=itemgetter(1))

def max_occurrences_2(seq=L):
    "defaultdict iteritems"
    c = defaultdict(int)
    for item in seq:
        c[item] += 1
    return max(c.iteritems(), key=itemgetter(1))

def max_occurrences_3a(seq=L):
    "sort groupby generator expression"
    return max(((k, sum(1 for i in g)) for k, g in groupby(sorted(seq))), key=itemgetter(1))

def max_occurrences_3b(seq=L):
    "sort groupby list comprehension"
    return max([(k, sum(1 for i in g)) for k, g in groupby(sorted(seq))], key=itemgetter(1))

def max_occurrences_4(seq=L):
    "counter"
    return Counter(L).most_common(1)[0]

versions = [max_occurrences_1a, max_occurrences_1b, max_occurrences_2, max_occurrences_3a, max_occurrences_3b, max_occurrences_4]

print sys.version, "\n"

for vers in versions:
    print vers.__doc__, vers(), timeit(vers, number=20000)

Результаты на моей машине:

2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] 

dict iteritems (4, 6) 0.202214956284
dict items (4, 6) 0.208412885666
defaultdict iteritems (4, 6) 0.221301078796
sort groupby generator expression (4, 6) 0.383440971375
sort groupby list comprehension (4, 6) 0.402786016464
counter (4, 6) 0.564319133759

Итак, кажется, что решение Counter не является самым быстрым. И, в этом случае, по крайней мере, groupby работает быстрее. defaultdict хорошо, но вы платите немного за его удобство; он немного быстрее использует обычный dict с get.

Что произойдет, если список намного больше? Добавьте L *= 10000 к тесту выше и уменьшите количество повторений до 200:

dict iteritems (4, 60000) 10.3451900482
dict items (4, 60000) 10.2988479137
defaultdict iteritems (4, 60000) 5.52838587761
sort groupby generator expression (4, 60000) 11.9538850784
sort groupby list comprehension (4, 60000) 12.1327362061
counter (4, 60000) 14.7495789528

Теперь defaultdict является явным победителем. Так что, возможно, стоимость метода get и потери дополнения inplace складывается (рассмотрение сгенерированного кода остается в виде упражнения).

Но с измененными тестовыми данными количество уникальных значений элементов не изменилось, поэтому предположительно dict и defaultdict имеют преимущество над другими реализациями. Итак, что произойдет, если мы будем использовать более крупный список, но существенно увеличим количество уникальных предметов? Заменив инициализацию L следующим образом:

LL = [1,2,45,55,5,4,4,4,4,4,4,5456,56,6,7,67]
L = []
for i in xrange(1,10001):
    L.extend(l * i for l in LL)

dict iteritems (2520, 13) 17.9935798645
dict items (2520, 13) 21.8974409103
defaultdict iteritems (2520, 13) 16.8289561272
sort groupby generator expression (2520, 13) 33.853593111
sort groupby list comprehension (2520, 13) 36.1303369999
counter (2520, 13) 22.626899004

Итак, теперь Counter явно быстрее решений groupby, но все же медленнее версий iteritems dict и defaultdict.

Целью этих примеров является не создание оптимального решения. Дело в том, что часто не существует одного оптимального общего решения. Кроме того, существуют другие критерии эффективности. Требования к памяти будут существенно различаться между решениями, и по мере увеличения размера ввода требования к памяти могут стать основным фактором при выборе алгоритма.

Нижняя строка: все зависит и вам нужно измерить.

Ответ 5

Возможно, метод most_common()

Ответ 6

Я получил наилучшие результаты с помощью модуля groupby от itertools с помощью этой функции, используя Python 3.5.2:

from itertools import groupby

a = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]

def occurrence():
    occurrence, num_times = 0, 0
    for key, values in groupby(a, lambda x : x):
        val = len(list(values))
        if val >= occurrence:
            occurrence, num_times =  key, val
    return occurrence, num_times

occurrence, num_times = occurrence()
print("%d occurred %d times which is the highest number of times" % (occurrence, num_times))

Вывод:

4 occurred 6 times which is the highest number of times

Протестируйте с помощью модуля timeit от timeit.

Я использовал этот script для моего теста с number= 20000:

from itertools import groupby

def occurrence():
    a = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
    occurrence, num_times = 0, 0
    for key, values in groupby(a, lambda x : x):
        val = len(list(values))
        if val >= occurrence:
            occurrence, num_times =  key, val
    return occurrence, num_times

if __name__ == '__main__':
    from timeit import timeit
    print(timeit("occurrence()", setup = "from __main__ import occurrence",  number = 20000))

Выход (лучший):

0.1893607140000313

Ответ 7

Простой способ без каких-либо библиотек или наборов

def mcount(l):
  n = []                  #To store count of each elements
  for x in l:
      count = 0
      for i in range(len(l)):
          if x == l[i]:
              count+=1
      n.append(count)
  a = max(n)              #largest in counts list
  for i in range(len(n)):
      if n[i] == a:
          return(l[i],a)  #element,frequency
  return                  #if something goes wrong

Ответ 8

Я хочу добавить другое решение, которое выглядит красиво и быстро для коротких списков.

def mc(seq=L):
    "max/count"
    max_element = max(seq, key=seq.count)
    return (max_element, seq.count(max_element))

Вы можете сравнить это с кодом, предоставленным Ned Deily, который даст вам эти результаты для самого маленького тестового примера:

3.5.2 (default, Nov  7 2016, 11:31:36) 
[GCC 6.2.1 20160830] 

dict iteritems (4, 6) 0.2069783889998289
dict items (4, 6) 0.20462976200065896
defaultdict iteritems (4, 6) 0.2095775119996688
sort groupby generator expression (4, 6) 0.4473949929997616
sort groupby list comprehension (4, 6) 0.4367636879997008
counter (4, 6) 0.3618192010007988
max/count (4, 6) 0.20328268999946886

Но будьте осторожны, он неэффективен и, таким образом, становится очень медленным для больших списков!

Ответ 9

Ниже приведено решение, с которым я столкнулся, если в строке есть несколько символов, имеющих самую высокую частоту.

mystr = input("enter string: ")
#define dictionary to store characters and their frequencies
mydict = {}
#get the unique characters
unique_chars = sorted(set(mystr),key = mystr.index)
#store the characters and their respective frequencies in the dictionary
for c in unique_chars:
    ctr = 0
    for d in mystr:
        if d != " " and d == c:
            ctr = ctr + 1
    mydict[c] = ctr
print(mydict)
#store the maximum frequency
max_freq = max(mydict.values())
print("the highest frequency of occurence: ",max_freq)
#print all characters with highest frequency
print("the characters are:")
for k,v in mydict.items():
    if v == max_freq:
        print(k)

Вход: "привет люди"

Вывод:

{'o': 2, 'p': 2, 'h': 1, ' ': 0, 'e': 3, 'l': 3}

самая высокая частота появления: 3

символы:

e

l

Ответ 10

Простой и лучший код:

def max_occ(lst,x):
    count=0
    for i in lst:
        if (i==x):
            count=count+1
    return count

lst=[1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
x=max(lst,key=lst.count)
print(x,"occurs ",max_occ(lst,x),"times")

Выход: 4 происходит 6 раз

Ответ 11

Мой (простой) код (три месяца изучения Python):

def more_frequent_item(lst):
    new_lst = []
    times = 0
    for item in lst:
        count_num = lst.count(item)
        new_lst.append(count_num)
        times = max(new_lst)
    key = max(lst, key=lst.count)
    print("In the list: ")
    print(lst)
    print("The most frequent item is " + str(key) + ". Appears " + str(times) + " times in this list.")


more_frequent_item([1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67])

Выход будет:

In the list: 
[1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
The most frequent item is 4. Appears 6 times in this list.

Ответ 12

Если вы используете Python 3.4 или выше, вы можете использовать statistics.mode()

>>> import statistics
>>> L = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67] 
>>> statistics.mode(L)
4

Обратите внимание, что при этом будет выдано statistics.StatisticsError, если список пуст или если не существует только одного наиболее распространенного значения.

Ответ 13

может что-то вроде этого:

testList = [1, 2, 3, 4, 2, 2, 1, 4, 4] print(max(set(testList), key = testList.count))