LeetCode #28 Implement strStr()
Problem
Implement strStr().
Return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.
Example 1:
Input: haystack = "hello", needle = "ll"
Output: 2
Example 2:
Input: haystack = "aaaaa", needle = "bba"
Output: -1
Clarification:
What should we return when needle
is an empty string? This is a great question to ask during an interview.
For the purpose of this problem, we will return 0 when needle
is an empty string. This is consistent to C's strstr() and Java's indexOf().
Solution — Brute Force
Compare pattern with every substring in the original string. This is the worst approach but it’s simple to implement.
Complexity
If m
denotes to the length of haystack
and n
denotes to the length of needle
, it will take O(mn) time for double for-loop.
It’s trivial that it only needs O(1) extra space.
Solution — Rolling Hash
Considering the brute force mechanism, it looks like it will be faster if we can hash each substring and compare with hashed needle
pattern string. By using strategy called rolling hash, it’s easy to calculate the hash value of next substring by the hash value of previous substring.
Here we will use polynomial rolling hash method for implementation, but, unfortunately, it will reach time limit of LeetCode submission.
It’s noted that there might be overflow issue by adopting rolling hash strategy. And Robin-Karp algorithm can be used to solve this issue.
Complexity
If m
denotes to the length of haystack
and n
denotes to the length of needle
, it takes O(m) time for computing hash value of haystack[0:n]
and takes O(n) time for calculating hash value of needle
.For each iteration in traversing haystack
, rollingHash
function only takes O(1) for computing next hash value so it takes O(m) time for iterating whole list. Therefore, total time complexity is O(m + n + m) = O(m+n).
For space complexity, because we only use params to save hash value of each substring/pattern, so there is only O(1) extra space be used.
Solution — KMP
KMP algorithm improves brute force approach and builds failure function for finding prefix-suffix of each sub-pattern, so it doesn’t need to shift whole pattern back to the head again in each iteration.
Complexity
If m
denotes to the length of haystack
and n
denotes to the length of needle
, it takes O(n) space for maintaining failure function.
For time complexity, consider following problem:
There is a empty stack with three operations:
push(x): ordinary operation of a stack
pop: ordinary operation of a stack
multi-pop(k): pop
k
elements, stop immediately if stack is emptyIf we randomly do operations mentioned above
n
times, what’s its time complexity?
It is trivial that push/pop only take O(1) time but multi-pop may take O(k+s) if there is s
elements in this stack.
General Analysis
Because there is n
operations with worst case O(n) if we’d like to multi-pop n
elements out. So it takes O(n²) in worst case.
However, actually n² is not going to happen because stack is originally empty. Hence we have to consider following analytics.
Amortised Analysis
Consider there will only be n
elements can be pushed or popped in n
operations, it only takes O(n) time for this question.
This problem is just like string matching in KMP algorithm. If there is any character matched successfully, it will be saved into a stack. Once matching is failed, shifting sub-string is just like multi-pop operation.
Therefore, because there will be only n
characters in calculating failure value, it takes O(n) time for building failure function. Also, it takes O(m) time for doing string matching. And it’s total time complexity will be O(n+m).
Follow up
It’s noted to mention that you can also choose Boyer-Moore algorithm whose worst time complexity is O(mn) and best time complexity is O(m/n).