Thursday, December 5, 2013

Sorting and Efficiency

Selection Sort
The algorithm of the selection sort is fairly simple compared to other sorting algorithms. The goal of one iteration of the algorithm is to place the smallest element/object in the collection (of the unsorted part) to its correct position. In other words, the algorithm compares the first element in the unsorted part to the remaining elements (in the unsorted part as well) and keeps track of the smallest element found. Once there are no elements to check, the first element is swapped with the smallest element found (if necessary). This same operation is performed n times, where n is the number of elements in the collection. After this operation is performed for every position in the collection, the given collection becomes sorted.

Quick Sort
This sorting algorithm uses a different approach. It first chooses one element from the collection that should be sorted, so called pivot. Different quick sort implementations choose the pivot differently. Some take the first element, some take the last and some choose the pivot randomly. Once the pivot is chosen, it's placed in its correct position in the collection. This is done by putting all the elements smaller than the pivot to the left of the pivot and putting all the elements larger than the pivot to the right of the pivot. After this, the pivot may occur in any position in the collection. It may became the first element (if it's the smallest), the last element (if it's the largest) or may be placed somewhere in between other elements. The key thing is that the algorithms guarantees that the pivot is placed in its final position in the sorted collection. Once the pivot is placed, the same technique is applied (in a recursive way) to the other two (or less) unsorted parts of the collection until the collection is sorted.

Merge Sort
The key point of the merge sort is that this algorithm can easily make a sorted collection given two other sorted collections. It does so by comparing the smallest elements of each collection (if such exist) and placing the smaller one into a new collection until one of the collections becomes empty. Once one of the collections becomes empty, the remaining elements of the other collection are simply placed into the new collection. But this is just part of the merge sort algorithm. When we are given an unsorted collection of items that we want to sort, we don't have a pair of sorted collections to merge. So the other part of the merge sort creates such a condition. Since a collection of a single element is always sorted, the algorithm breaks down the given unsorted list into pieces/smaller collections of a single element. Then it merges them using the algorithm described above until the collection is sorted.

Efficiency
Efficiency is a very important aspect of programming to consider when developing an algorithm to solve a particular problem. An inefficient solution to a problem will do more work (calculations, memorization and so on) than is required. Or in other words such a solution will use more resources than it should, such as memory space and processing time. To conclude the idea of efficiency we can say that whenever we have two algorithms that solve the same problem we always should prefer the one that is more efficient even if it's longer and more complex.
The efficiency of any algorithm can be expressed in terms of the size of the input (usually denoted by n). Even thought we can express the precise growth function of an algorithm, in most cases we are only interested in the dominant term or the Big-O value which provides an accurate estimate for a large input size.
If n is the size of the collection to be sorted, then the following are the efficiencies of the sorting algorithms described above:
Selection sort: O(n2) in any case.
Quick sort: nlog(n) in the best and average case and O(n2) in the worst case.
Merge sort: nlog(n) in any case.
So if we had to choose one of the given three sorting algorithms to sort large collections of items, then we should choose the merge sort, because it is the most efficient one.

Wednesday, November 27, 2013

Dynamic Programming
  
Today in class we discussed how in certain circumstances the recursive approach of solving a problem does way more work than required. In the case of generating the Fibonacci sequence, the recursive method was doing the same calculations over and over, making the solution to the problem very inefficient. This is where the dynamic programming approach was introduced. Such approach "memorizes" the solutions to the sub problems and whenever the algorithm encounters such a problem again it simply looks up the solution in the solution collection (such as list or dictionary). As expected, the new approach (still recursive, but a smarter one) increased the efficiency of the algorithm. 

I used a similar technique for assignment 1.
We had to find out the best split of cheeses in order to move all the cheeses form first stool to the last one using four stools.  My method stored the best split of cheeses in the increasing order starting from 3 (cheeses) making available all the solutions required for each next number of cheeses. 

Below is my actual implementation of the method. 

# A dictionary that contains the best split (i) and the minimum number of
# moves it takes to move n cheeses using 4 stools for any n cheeses.
# The dictionary contains the required values once best_partition(n)
# is executed.
library = {1: [0, 1], 2: [1, 3]}


def best_partition(n: int) -> list:
    """
    For a given number of cheeses find and return a list that contains the
    best split (starting from the bottom) and the minimum number of moves it
    takes to move all the cheeses using 4 stools.

    For example to move 7 cheeses it's best to split them at i = 3 and this
    takes 25 moves, so:

    >>> best_partition(7)
    [3, 25]
    """

    if(n in library):
        return library[n]
    else:

        # Next number of cheeses missing in the library
        next_n = len(library) + 1

        # Steps it would take if 3 stools were used.
        min_steps = (2 ** next_n) - 1

        # Find the lowest number of steps and record the best split
        # together with the number of steps taken for the best split.
        for rep in range(next_n - 1):
            i = rep + 1
            steps = (2 ** i) - 1 + 2 * ((best_partition(next_n - i))[1])
            if(steps < min_steps):
                min_steps = steps
                library[next_n] = [i, min_steps]
        best_partition(n)
        return library[n]



Monday, November 18, 2013

Assignment 2

Since today in class Danny said that assignment 2 is marked, I suspect that the grade should be something similar to the auto marker results shown on MarkUs. The auto test shows that have 38/38 passes, so I would like to share my solution here. I would be happy to compare my solution to someone else's solution or to hear suggestions on how my algorithm can be improved to become a more efficient one. 

class RegexTree:
    """A Regex Tree"""
    def __init__(self: 'RegexTree', regex: str) -> None:
        """Req: regex must be valid regular expression.
           Initialize a regular expression tree.
        """
        # Parse a regex into tree.
        self.root = self.regex_to_tree(regex)

    def regex_to_tree(self: 'RegexTree', regex: str):
        # Last index of the regex string.
        last = len(regex) - 1

        # Create RegexTreeNode for '0', '1', or 'e'.
        if (len(regex) == 1):
            return RegexTreeNode(regex, [])

        # Create StarNode for regex that ends with '*'.
        elif (regex[last] == '*'):
            return StarNode(self.regex_to_tree(regex[:last]))

        # Create BarNode or DotNode.
        elif (regex[0] == '(' and regex[last] == ')'):

            # Initialize parentheses counters.
            l_paren = 0
            r_paren = 0

            # Go through each character in the regex string, ignoring
            # the first and the last characters, which are '(' and ')'.
            for i in range(1, last):

                # Increment left parentheses counter when encounter one.
                if (regex[i] == '('):
                    l_paren += 1

                # Increment right parentheses counter when encounter one.
                elif (regex[i] == ')'):
                    r_paren += 1

                # Create BarNode when appropriate.
                elif (regex[i] == '|') and (l_paren == r_paren):
                    return BarNode(self.regex_to_tree(regex[1:i]),
                                   self.regex_to_tree(regex[(i + 1):last]))

                # Create DotNode when appropriate.
                elif (regex[i] == '.') and (l_paren == r_paren):
                    return DotNode(self.regex_to_tree(regex[1:i]),
                                   self.regex_to_tree(regex[(i + 1):last]))


def regex_match(r: 'RegexTree', s: str) -> bool:
    return node_match(r.root, s)


def node_match(node: 'RegexTreeNode', s: str) -> bool:
    # Check if a regex of length one (when the symbol of the
    # node is '0', '1' or 'e') matches a given string.
    if (node.children == []):
        return ((node.symbol == '0' and s == '0') or
                (node.symbol == '1' and s == '1') or
                (node.symbol == 'e' and s == ''))

    # Check if a regex of the form r* (when the symbol of the
    # node is '*') matches a given string.
    elif (node.symbol == '*'):
        # Return True if the given string s is an empty string.
        if (s == ''):
            return True

        # Otherwise check if the string s can be broken down into such strings,
        # where every string will match the regex node.
        else:
            for x in range(len(s)):
                if (node_match(node.children[0], s[:(len(s) - x)])):
                    # Return True if regex matches the whole string s.
                    # Keep breaking s into parts if regex matches some part
                    # of string s.
                    return node_match(node, s[(len(s) - x):])

            # Return False if there is part of string s that doesn't
            # match regex.
            return False

    # Check if a regex of the form (r1|r2) (when the symbol of the
    # node is '|') matches a given string.
    elif (node.symbol == '|'):
        return ((node_match(node.children[0], s)) or
                (node_match(node.children[1], s)))

    # Check if a regex of the form (r1.r2) (when the symbol of the
    # node is '.') matches a given string.
    elif (node.symbol == '.'):
        for i in range(len(s) + 1):
            # Partition the string s into all possible strings s1 and s2,
            # where s1 + s2 = s, until r1 matches s1 and r2 matches s2.
            if ((node_match(node.children[0], s[:i])) and
               (node_match(node.children[1], s[i:]))):
                return True

        # Return False if such a partition of s into s1 and s2 is impossible.
        return False

Thursday, November 14, 2013

Midterm 2

Just yesterday we had our second midterm and this time I feel much better about it. I thought that I simply would like to share my solution here in case someone wants to compare the results or maybe point on any mistakes that I have possible done. I hope I remembered the problems as they were.

Question 1 asked to count the number of internal nodes and the number of leaves.

def count_nodes(n: BTNode) -> (int, int):
    if (n is None):
        return (0, 0)
    elif (n.left is None) and (n.right is None):
        return (0, 1)
    else:
        interns = 1 + count_nodes(n.left)[0] + count_nodes(n.right)[0]
        leaves = count_nodes(n.left)[1] + count_nodes(n.right)[1]
        return (interns, leaves)

Question 2 asked to count the number of even integers in a binary tree.

def count_even(n: BTNode) -> int:
    if (n is None):
        return 0
    if (n.value % 2 == 0):
        return (1 + count_even(n.left) + count_even(n.right))
    else:
        return (count_even(n.left) + count_even(n.right))

Question 3 asked to construct a path to the min element in the BST and return the first LLNode in the chain (I hope it didn't ask to construct the shortest path).

def min_path(n: BSTNode) -> LLNode:
    if (n is None):
        return None
    else:
        llnode = LLNode(n.value, min_path(n.left))
        return llnode

Sunday, November 10, 2013

Sorting

For a big amount of data we prefer to use the binary search technique instead of the linear search. However, the binary search requires the search pool to be sorted. The need of a sorted collection of objects led to the development of various sorting algorithms. Some of the sorting techniques (that I know) are: bubble sort, insertion sort, selection sort, merge sort, quicksort and heapsort. Among those, bubble, insertion and selection sorts are less efficient (on average) and perform with O(n2). The rest can sort the elements with the complexity of O(nlogn). However, when given particular types of inputs to be sorted or in the best case, some less efficient algorithms become more efficient (bubble sort and insertion sort have O(n)). The same refers to more efficient algorithms when given the worst case. For example, the quicksort for the worst input will do the job with O(n2). This is why Python developers for example use a combination of different sorts when implementing the built in sort() method.

Sunday, November 3, 2013

Algorithm Complexity

In programming we are often interested in how our final piece of software performs. In particular we usually want to find the relationship between the size of our input (into the program) and the time/space the program will require in order to perform the operations needed. This is needed in order to try and minimize the resources that a program will use. We don’t want to overload the hardware which runs our software and we don’t want the user to wait forever for the program to execute. So once we find the relationship (in form of a function) we can determine the order, which is basically the leading term in the function. The order tells us everything we need and it can be constant, logarithmic, linear, quadratic, cubic, exponential and even worse (listed from most efficient). At the end if we have algorithms that perform the task, we always prefer/choose the one with the best order.

Monday, October 28, 2013

Binary Search Tree

As introduced recently in class a BST (Binary Search Tree) is a special type of a binary tree that has a new property. The special thing about a BST is that every element in a tree is larger than its left child and smaller than its right child (if they exist). The element of the tree (nodes) don't have to be numbers, they should simply by comparable objects. Arranging elements into a BST can be viewed as sorting the elements, since such arrangement makes it easier to search through all the elements. In particular, this turns the linear search with the order of n into a binary search which has a complexity of log2(n).