Merge pull request julycoding#382 from tastynoodle/master

add ebook/en/03.0.md
renozhang · Oct 17, 2014 · 9ff103d · 9ff103d
2 parents 34aeba4 + b9485d9
commit 9ff103d
Show file tree

Hide file tree

Showing 2 changed files with 109 additions and 3 deletions.
diff --git a/ebook/en/03.0.md b/ebook/en/03.0.md
@@ -0,0 +1,105 @@
+Chapter 3: Find Smallest K Numbers
+==========
+
+## Problem Description
+
+Given n integers, output the smallest k numbers.  
+
+## Analysis and Solutions
+### Solution One
+
+To get the smallest k numbers, the simplest way is, we sort the array, then output the first k numbers.
+
+As for the sorting algorithms，I guess quick-sort is a good choice ( as we known, the average time complexity of Quick-Sort is `n*logn`), then we just output the first k numbers. So the total time complexity is: `O(n * log n) + O(k)=O(n * log n)`.
+
+### Solution Two
+
+The problem does not require the smallest k numbers to be sorted, neither are the last n-k numbers. So we do not need to sort all the numbers. We can use Selection-Sort or Swap-Sort:
+
+1. Traverse the n numbers, put the first k numbers into a new array, assume they are the smallest k numbers;
+2. For these k numbers, use Selection-Sort or Swap-Sort to get the biggest element kmax (need to traverse these k numbers, the time complexity is `O(k)`);
+3. Traverse the rest n-k numbers. Suppose each time the new element is x, compare x with kmax: if `x < kmax`, replace kmax with x, then jump to step 2 and find the biggest element kmax; if `x >= kmax`, just continue traversing.
+
+In each traverse, the time complexity is `O(k)` if we update the array, `O(1)` if we do not. So totally the time complexity is `n*O(k) = O(n*k)`.
+
+### Solution Three
+
+A better way is maintaining a max-heap with capacity k, it is similiar with the solution two:
+
+1. Store the first k numbers in a max-heap with capacity k, assume they are the smallest k numbers;
+2. Sort the heap, make k1<k2<...<kmax (kmax is the biggest number in the heap);
+3. Traverse the rest n-k numbers. Suppose each time the new element is x, compare x with kmax: if `x < kmax`, replace kmax with x and reheapify the heap (`O(logk)`); Otherwise do nothing.
+
+In this case the time complexity is: `O(k+(n-k)*logk)=O(n*logk)`. This is because in the heap we just need `O(logk)` to operate a search or reheapify (In solution two, to find the biggest number in an array we need `O(k)`).
+
+### Solution Four
+
+The book \<Date Structures and Algorithm Analysis in C\>, in Chapter7, Section7.7.6, introduced a Quick-Selection algorithm. In average case, its time complexity is `O(n)`:
+
+- Suppose the number set is S, in S we choose a number v as the pivot. Divide S-{v} into S1 and S2, like the Quick-Sort.
+- If k <= |S1|, the k-th smallest number is in S1. In this case, return QuickSelect(S1, k).
+- If k = 1 + |S1|, the pivot is the k-th smallest number, return it.
+- Otherwise, the k-th smallest number is in S2. It is the (k - |S1| - 1) smallest number in S2, so we return QuickSelect(S2, k - |S1| - 1).
+
+The time complexity is O(n).
+
+You can find the code below:
+
+```cpp  
+//Quick-Select, put the k-th smallest number in a[k-1]  
+void QuickSelect( int a[], int k, int left, int right )
+{
+    int i, j;
+    int pivot;
+
+    if( left + cutoff <= right )
+    {
+        pivot = median3( a, left, right );
+        //set the median as the pivot, this can avoid the wort case, to some extent
+        i = left; j = right - 1;
+        for( ; ; )
+        {
+            while( a[ ++i ] < pivot ){ }
+            while( a[ --j ] > pivot ){ }
+            if( i < j )
+                swap( &a[ i ], &a[ j ] );
+            else
+                break;
+        }
+        //reset the pivot
+        swap( &a[ i ], &a[ right - 1 ] );  
+
+        if( k <= i )
+            QuickSelect( a, k, left, i - 1 );
+        else if( k > i + 1 )
+            QuickSelect( a, k, i + 1, right );
+    }
+    else  
+        InsertSort( a + left, right - left + 1 );
+}
+
+```
+
+This Quick-Select algorithm is kind of like the partition method in Quick-Sort. Store the N numbers in array S, set the 'median of median' as the pivot. Then divide array into Sa and Sb, Sa<=X<=Sb. If there are more than k elements in Sa, return the smallest k numbers in Sa, otherwise return all the elements in Sa and the smallest k-|Sa| numbers in Sb. In average case the time complexity is `O(n)`.
+
+Furthermore, the book \<Introduction to Algorithms\>, in Chapter9 Section9.3, introduced a Select Algorithm whose time complexity is O(n) in the worst case. I recommend you to have a look.
+
+
+## Think Deeper
+1.Google interview: input two arrays consisting of integers, sum each two of them we can get a new array, how to find the k-th smallest numbers in this new array?
+
+Analysis:  
+
+     ‘Suppose the two arrays are A and B, they all have N numbers, sum each two of them we can get a new array C, which has N^2 elements.
+      Then consider these as N sorted arrays:
+              A[1]+B[1] <= A[1]+B[2] <= A[1]+B[3] <=…
+              A[2]+B[1] <= A[2]+B[2] <= A[2]+B[3] <=…
+              …
+             A[N]+B[1] <= A[N]+B[2] <= A[N]+B[3] <=…
+      The problem turns into finding the smallest k numbers in these N^2 sorted array’
+
+
+2.Suppose there are two increasing arrays A and B, A=(a1,a2,...,ak), B=(b1,b2,...,bk). For 1<=i, j<=k, find the smallest k (ai+bj). The algorithm is required to be efficient.
+
+
+3.Given an array a1,a2,a3,...,an and m triples, each triple represents a query. For each query (i, j, k), output the k-th smallest number in ai，ai+1，...，aj.
diff --git a/ebook/en/Readme.md b/ebook/en/Readme.md
@@ -7,13 +7,14 @@
  - [1.1 - Left Rotating String](01.0.md)
 
 * Chapter 2  Array's Mystery
- - [2.6 - Maximum Subarray Problem](07.0.md)
+ - [2.1 - Find Smallest K Numbers](03.0.md)
+ - [2.6 - Maximum Subarray Problem](07.0.md) 
 
 * Chapter 3  Searching and Sorting
  - [3.1: Binary Search](25.0.md)
 
-* Chapter 4  Algorithm Optimization
- - [4.3: Perfect shuffle algorithm](35.0.md)
+* Chapter 4  Algorithm Optimization 
+ - [4.3: Perfect shuffle algorithm](35.0.md) 
 
 * Chapter 5  Dynamic Programming
  - [5.1 - Longest Common Subsequence](11.0.md)