The main trick is to count for each possible number of citations, how many times we see this number in our citations. Note, that if number of citations is more than total number of papers N, we can reduce this numer to N and nothing will change. Let me explain my solutoin given test example [3,0,6,1,5].
We create array, which I called buckets = [1, 1, 0, 1, 0, 2], where buckets[i] is number of papers with i citations if i < N and bucket[N] is number of papers with >=N citations.
Now, we create accum array, where accum[0] is number of papers with >=1 citation, accum[1] is number of papers with >=2 citations and so on. When we evaluate it for our example we can see, that it is equal to accum = [4,3,3,2,2]. Note, that we start with 1 citation, not with zero, that is why we need to use accum[1:] in our code.
Finally, we need to go through this array and find the bigest number i, for which accum[i] >= i + 1 and return i + 1, in our example it is equal to 3.
Complexity
Complexity is O(n), though we traverse our data 4 times, and it is not the optimal solution. We can evaluate cumulative sums in place, and compare them in place with i + 1 and keep the index of maximum reached so far, and interrupt when inequality accum[i] >= i + 1 does not hold anymore. Howerer I like cumulative sums and code is more clean in my way.
class Solution:
def hIndex(self, citations):
N = len(citations)
buckets = [0] * (N + 1)
for elem in citations:
buckets[min(elem, N)] += 1
accum = list(accumulate(buckets[1:][::-1]))[::-1]
compar = [accum[i] >= i + 1 for i in range(N)]
return (compar + [0]).index(0)
|
eg : ans = Solution()
ans.hIndex([3,0,6,1,5])
Output -
3