{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Python Data Structure\n",
    " ** list , set, tuple, named tuple, dictionary**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## list\n",
    "* We generally use list to store several values for same type of data like list of integers, list of strings etc.\n",
    "* List are useful when we want o modify the content like adding or deleting\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Creating list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "xL = [] # empty list\n",
    "yL  = list() # empty list\n",
    "zL= [21, 3, 7 ,11]\n",
    "print zL\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Via list comphrehension**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# [<some_func>(x) for x in <something> if  <some_condition_is_true>]\n",
    "oddSquareL = [x*x for x in xrange(1, 11, 2) if x%5 == 0 ] \n",
    "print oddSquareL\n",
    "oddSquareL = [x*x for x in xrange(1, 11, 2)  ] \n",
    "print oddSquareL"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "continentsL = ['Europe', 'Africa', 'America', 'Antarctica', 'Asia', 'Australia']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "compositeL = zL + oddSquareL\n",
    "print compositeL\n",
    "mixedL = compositeL + continentsL\n",
    "print mixedL"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "listOfL = [zL, oddSquareL] # list of list\n",
    "print listOfL"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "we can do indexing and slicing in list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print oddSquareL\n",
    "print oddSquareL[2] # third element\n",
    "print oddSquareL[3:] # all the elements from index 3 to end \n",
    "print oddSquareL[-1]  # negative for indexing from last\n",
    "print oddSquareL[-3:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Modifying list\n",
    "#### inserting"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "zL.append(22)\n",
    "print zL"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "zL.insert(2, 5)\n",
    "print zL\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### deleting"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "zL.append(5)\n",
    "print zL\n",
    "zL.remove(5)\n",
    "print zL"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "sorting"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "zL.sort()\n",
    "print zL"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print continentsL\n",
    "continentsL.sort()\n",
    "print continentsL"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "what if we want to sort by length?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "continentsL.sort?\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "continentsL.sort(key=len)\n",
    "print continentsL"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can  write our own function to pass as key"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def keyFunc(s1):\n",
    "    return s1[-3:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Excercise: sort using last 3 character?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Something I am learning overtime **\n",
    " * How to iterate over elements"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "for item in oddSquareL:\n",
    "    print item"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "when we want the index of element in the list, use following syntax. It is more pythonic way of doing it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "for index, item in enumerate(oddSquareL):\n",
    "    print index, item"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "flattening a list of list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "listOfList = [[1, 2],  [4, 5], [ 10, 13]]\n",
    "#onway to Do it\n",
    "flatList = []\n",
    "for innerList in listOfList:\n",
    "    for item in innerList:\n",
    "        flatList.append(item)\n",
    "print flatList        \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# or we can use list comprehension\n",
    "\n",
    "flatList1 = [ item for innerList in listOfList for item in innerList]\n",
    "print flatList1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "simulating coin flips"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from random import random\n",
    "coinToss = [ int(round(random())) for x in xrange(10)]\n",
    "print coinToss"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# more example at http://howchoo.com/g/ngi2zddjzdf/how-to-use-list-comprehension-in-python\n",
    "\n",
    "More on list comprehension usage example in **Utility** section using os module\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## tuple and named tuple \n",
    "tuple are use to aggregate dfferent piece of data in one container.\n",
    "Tuple are immutable. So we can't modify their content\n",
    "**Creating tuple**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "ourGalaxy = (\"Milky Way \", \"100,000-120,000\", 200, 100) # name, diameter, stars, palnets\n",
    "print ourGalaxy\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "ourGalaxy[3] = 400"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "tuple are good to pack together values and pass them around function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from __future__ import  division\n",
    "def calculatePlanetPerStar(galaxy):\n",
    "    name, diameter, stars, planets = galaxy # tuple unpacking\n",
    "    return planets/stars\n",
    "print calculatePlanetPerStar(ourGalaxy)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can do indexing and slicing in tuples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print('number of planet {}'.format(ourGalaxy[3]))\n",
    "print ourGalaxy[2:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One of the major disadvantage is their readability. What is the third poition of a tuple.\n",
    "Can guess from context, but is hard in general if there are  lot element in tuple.\n",
    "\n",
    "**Name tuple come to rescue in this situation**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from collections import  namedtuple\n",
    "ChannelInfo = namedtuple('ChannelInfo','redChFile, blueChFile, greenChFile')\n",
    "channelInfo = ChannelInfo('redXX.tif', 'bluXX.tif', greenChFile= 'greenXX.png')\n",
    "print channelInfo.blueChFile\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Dictionaries\n",
    "Dictionaries allows mapping object to other objects i.e they store key valye pair. They are extremly efficient in look up based on key.\n",
    "\n",
    "They are also go to pass variable number of parameters in the functions.\n",
    "\n",
    "**Key:** value used for index\n",
    "\n",
    "**Value** stored value"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### creating dictionaries\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "algoConfig = {} # can use nameAgeDic = dict()\n",
    "algoConfig['threshold'] = 20 # in pixel\n",
    "algoConfig['neighborhoodSize'] = 4 # 4X4 neighborhood\n",
    "algoConfig['tolerance'] = 0.002 # convergence tolerance\n",
    "print algoConfig"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "charValDict = {'a':2, 'b': 10, 'c':54}\n",
    "print charValDict\n",
    "print charValDict['b']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using directionary comprehension"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from collections import  namedtuple\n",
    "examScore = namedtuple('examScore', 'firstExam seconExam thirdExam')\n",
    "studentScore = [ ('John', examScore(50, 10, 100)), ('sam', examScore(60, 20, 90)),\n",
    "                    ('clark', examScore(30, 70, 80))]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let say we want to build a dictionary of student name and total score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "finalScoreDict = {  name: sum(score) for  name, score in studentScore }\n",
    "print finalScoreDict"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let go back to algoConfig dictionary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print algoConfig"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print algoConfig['maxLength']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# To avoid KeyError we can use\n",
    "print(algoConfig.get('maxLength'))\n",
    "# or\n",
    "print(algoConfig.get('maxLength', 'Value Not Found'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "algoConfig['maxLength']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also use setdefault method to set the value if  key is not in the dictionary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print algoConfig.setdefault('threshold', 0)\n",
    "print algoConfig.setdefault('radius', 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Other useful dictionary methods are keys(),  values() and items()**\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "for name, value in algoConfig.items():\n",
    "    print(' parameter name is {} and value is {}'.format(name, value))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**defaultdict**\n",
    "\n",
    "setdefault is useful when when doesn't exist. This can be monotonous we  have to do every time.\n",
    "\n",
    "let say we wan to count the frequency of number in in a list\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# we can do this way\n",
    "observ = [1, 3, 5, 7, 3, 7, 7]\n",
    "def countFreq(observations):\n",
    "    frequencies = {}\n",
    "    for item in observations:\n",
    "        freq = frequencies.setdefault(item, 0)\n",
    "        frequencies[item] = freq + 1\n",
    "    return frequencies    \n",
    "print countFreq(observ)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# or use defaultdict\n",
    "from collections import defaultdict\n",
    "def countFreqDefaultDictUsage(observations):\n",
    "    frequencies = defaultdict(int)\n",
    "    for item in observations:        \n",
    "        frequencies[item] = frequencies[item] + 1\n",
    "    return frequencies    \n",
    "print countFreqDefaultDictUsage(observ)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Can use counter also**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from collections import  Counter\n",
    "print Counter(observ)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sets\n",
    "Set are simiar to mathematical set. They represent collection of unique things.\n",
    " we can check set cardinality, set membership, union, intersection etc."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### creating and modifying sets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "nameSet = set() # empty set\n",
    "ageSet = set([10, 20, 5])\n",
    "print ageSet\n",
    "ageSet.add(15)\n",
    "print ageSet\n",
    "print len(ageSet) #cardinality of set\n",
    "print 10 in ageSet"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can create set using set comphrehension"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "squareSet = {i*i for i in xrange(1, 11)}\n",
    "print squareSet"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Set is useful to remove duplicates"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "dupList = [20, 10, 70, 30, 10, 20]\n",
    "ageSet1 = set(dupList)\n",
    "print ageSet1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "ageSet2 = set([20, 10, 70, 30, 10, 20])\n",
    "print ageSet2.intersection(ageSet)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Excercise: Supose we have following list of tuples (instructor name, course taught) **\n",
    "** What are the different course taught in the department? **"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "instructorCourseLibrary = [('Dr. Blecher', 'Functional Analysis'), ('Dr. Azencott','statistics') ,\n",
    "                           ('Dr. Demetrio', 'Functional Analysis') ,('Dr. Fuo ', 'statistics' ) ,\n",
    "                           ('Dr. Pan', 'calculus 3')]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Now back to writing function but with variable number of arguments"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def logInfo(level, *args ):\n",
    "    print level\n",
    "    print args\n",
    "    for arg in args:\n",
    "        print arg\n",
    "        \n",
    "logInfo(1, '9 Nov', 'Houston', 75 )        "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Actually these variable number of arguments are packed into a tuple.\n",
    "\n",
    "Using *operator we can do the opposite i.e unpack the argument out of a list or tuple"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "param = [1, 5]\n",
    "\n",
    "range(*param)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Similary for passing  variable length keyword argument use following syntax**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def logInfoKeyWord(level, **kwargs ):\n",
    "    print level\n",
    "    print kwargs\n",
    "    for key , val in kwargs.items():\n",
    "        print('key = {} value = {}'.format(key, val))\n",
    "        \n",
    "\n",
    "logInfoKeyWord(1, Date= '9 Nov', City = 'Houston', Temperature=75)\n",
    "     "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or we can use ** operator on a dictionary to get keyword arguments"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "params = {'Date': '9 Nov', 'City' : 'Houston', 'Temperature':75} \n",
    "logInfoKeyWord(1, **params )   "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before the variable number of arguments, zero or more normal arguments may occur.\n",
    "if we use both then \\*args must occur before **kwargs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Demo: How to debug Python program"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}