Reading for Week 1

Work through Section 3 of the Python Tutorial http://docs.python.org/3/tutorial/ till the end of 3.1.2 Strings.

Read the first chapter of the book Programming Knights till the end of 1.7 Examples of Programs Using the input() statement.

Also, look up the definition of str.find at http://docs.python.org/3/library/stdtypes.html#str.find.

Goal

Introduction to Python in the context of implementing a simple search engine.

Based on:

Udacity Course CS101 "Intro to Computer Science"

The Python Tutorial

Build a Search Engine
Find Data (Weeks 1-3)
Build an Index (Weeks 4-5)
Rank Pages (optional)

Week 1 - Find Data - How to Get Started

Installing Python

Go to http://www.python.org/getit/releases/3.3.3/ and follow the instructions under Download

First Python Program

  print(3)
  print(1 + 1)
  print(52 * 3 + 12 * 9)
  print(52 * 3) + (12 * 9))
  print(52 * (3 + 12) * 9)
  print(365 * 24 * 60 * 60) 

First Programming Quiz

  # Write Python code that prints out 
  # the number of minutes in 5 weeks.

Syntax errors / Grammar / Backus Naur From

  # the following Python code produces a syntax error
  print(2 + 2 +)

Python Grammar for Arithmetic Expressions (Simplified)

  Expression      -> Expression Operator Expression
  Expression      -> Number
  Operator        -> +
  Operator        -> *
  Number          -> 0,1,2...
  Expression      -> (Expression)
  Print_Statement -> print(Expression)

Which of the following are valid Python expressions that can be produced starting from Expression?

  3
  ((3)
  (1*(2*(3*4)))
  + 3 3
  (((7)))
  2 + 2 + 

Programming Quiz

  # Write Python code to print out 
  # how far light travels in centimeters in one nanosecond.
  #
  # speed of light = 299 792 458 meters / second 
  # meter = 100 centimeters
  # nanosecond = 1.0 / 1000000000 

Variables / Assignment

  # Write Python code to print out 
  # how far light travels in centimeters in one nanosecond.

  speed_of_light = 299792458 
  billionth = 1.0 / 1000000000
  meter = 100
  print(speed_of_light * meter * billionth)

Variables - Programming Quiz

  # Given the variables defined here, write Python 
  # code that prints out the distance, in meters, 
  # that light travels in one processor cycle. 

  speed_of_light = 299792458
  cycles_per_second = 2700000000

Variables / Assignment

  # = means assignment
  speed_of_light = 299792458

  # 2.7 Ghz
  cycles_per_second = 2700000000           
  print(speed_of_light * 1.0 / cycles_per_second)
   
  # 2.8 Ghz
  cycles_per_second = 2800000000           
  print(speed_of_light * 1.0 / cycles_per_second)

Variables - Programming Quiz

  # What is the value of hours after running this code?
      
  hours = 9
  hours = hours + 1
  hours = hours * 2

Variables - Programming Quiz 2

  # What is the value of seconds after running this code?
      
  minutes = minutes + 1
  seconds = minutes * 60

Variables - Programming Quiz 3

  # Write Python code that defines the variable 
  # age to be your age in years, and then prints 
  # out the number of days you have been alive.

Strings

Besides numbers, Python can also manipulate strings, which can be expressed in several ways. They can be enclosed in single quotes or double quotes with the same result. \ can be used to escape quotes:

  print('I am a string')
  print("I prefer double quotes")
  print("I'm happy I started with a double quote")
  print('I don\'t mind a single quote')

  # using a variable 
  hello = "Hello"
  print(hello) 

Strings - Quiz

  # Which of the following is a valid string?
    
  "Ada"
  'Ada"
  "Ada
  Ada
  '"Ada' 

Strings - Programming Quiz

    # Define a variable, name, and assign to it 
    # a string that is your name 

String Concatenation

Strings can be concatenated (glued together) with the + operator, and repeated with *

  # Define a variable, name, and assign to it 
  # a string that is your name. 
  # Print out the word Hello followed by your name and three !'s

  name = "Pawel"
  print("Hello " + name + " !!!")

  # print out the text "repeat three times" three times

  text = "repeat three times "
  print(3 * text)

Indexing Strings

Strings can be indexed (subscripted), with the first character having index 0. There is no seperate character type; a character is simply a string of size one.

    # < string >[< expression >]
    
    #      012345679
    print('Intro to Python'[0])   # => 'I'
    
    #      012345679
    print('Intro to Python'[1+1]) # => 't'
    
    #       012345679
    name = 'Pawel'
    print(name[1])                # => 'a' 

Indexing Strings

Indices may also be negative numbers and allow us to start counting from the right (the end of the string).

Note that -0 is the same as 0, so negative indices start from -1.

    #      0123
    print('word'[-1])   # => 'd'
    
    #      012345679
    print('word'[-2])   # => 'r'
    
    #       012345679
    name = 'Pawel'
    print(name[-3])     # => 'w' 

Indexing Strings - Quiz

    # Which of these pairs are two things 
    # with the exact same value?
    # s is a variable whose value is an arbitrary string

    print( s[3]      ,  s[1+1+1]        )
    print( s[0]      ,  (s+s)[0]        )
    print( s[0]+s[1] ,  s[0+1]          )
    print( s[1]      ,  (s+' is OK')[1] )
    print( s[-1]     ,  (s+s)[-1]       )        

Selecting Substrings / Slicing Strings

In addition to indexing, slicing is also supported. While indexing is used to obtain an individual character, slicing allows you to obtain a substring.

    # < string > [< expression >] => one-character string
    #               number
    
    print('Pawel'[1])           # => 'a' 

    print('Pawel'[1:3])         # => 'aw'

Selecting Substrings

    #               start            stop
    # < string > [< expression > : < expression >]
    #   s           number           number
    
    # => string that is a subsequence of 
    #    the characters in s 
    #    starting from position start and
    #    ending with position stop-1

Selecting Substrings

    name = 'Pawel'
    #       01234
    
    print(word[2:4])             # => 'awe'      
    word = 'assume'
    #       0123456

    print(word[3])              # => 'u'
    print(word[4:6])            # => 'me'
    print(word[4:])             # => 'me'
    print(word[:2])             # => 'as'
    print(word[:])              # => 'assume'   

Selecting Substrings - Programming Quiz

    # Write Python code that prints out Ucf (with a capital U), 
    # using the string variable s that 
    # is assigned the string 'ucf'.

    s = 'ucf' 

Selecting Substrings - Quiz

    # for any string s = '< any string >'
    # which of these is always equivalent 
    # to s?
    
    s[:]
    s + s[0:-1+1]
    s[0]
    s[:-1]
    s[:3] + s[3:]

Selecting Substrings

One way to remember how slices work is to think of the indices as pointing between characters, with the left edge of the first character numbered 0. Then the right edge of the last character of a string of n characters has index n.

    #  +---+---+---+---+---+---+
    #  | P | y | t | h | o | n |   
    #  +---+---+---+---+---+---+
    #  0   1   2   3   4   5   6
    # -6  -5  -4  -3  -2  -1
    #
    #  The slice from i to j consists of all
    #  characters between the edges labeled i and j

    print("Python"[-4:6]) # => 'thon'

Strings are immutable

Python strings cannot be changed - they are immutable. Therefore, assigining to an index position in the string results in a error.

    word = 'Python'
    word[0] = 'M'    
    # => TypeError: 'str' object does not support item assignment

    new_word = 'M' + word[1:] 
    # => 'Mython'

Length of a String

The built-in function len() returns the length of a string.

    s = 'megahypergigasuperlongstring' 
    print(len(s))

Finding Strings in Strings

    # < string >.find(< string >)
    
    search_string.find(target_string)
    
    # => number of the first position
    #    in search_string at which 
    #    target_string appears

    # => -1 if target_string is not found 

Finding Strings in Strings

    #                          11111111 
    #                012345678901234567 
    search_string = 'Python is so cool!' 
    # note search_string is a VARIABLE
    
    target_string = 'cool'

    search_string.find(target_string) # => 13
    search_string.find('boring')      # => -1

Finding Strings in Strings - Quiz

    # Which of the following evaluate to -1?
    
    'test'.find('t')
    "test".find('st')
    "Test".find('te')
    'west'.find('test') 

Finding Strings in Strings - Quiz 2

    # Assume that s is variable that stores
    # an arbitrary string
    # Which of the following always has the value 0?
    
    s.find(s)
    s.find('s')
    's'.find(s)
    s.find('')
    s.find(s+'!!!')+1 

Finding with Numbers

    # < string >.find(< string >, < number >)
    
    search_string.find(target_string, pos)
    
    # => number of the first position
    #    in search_string at which  
    #    target_string appears 
    #    at of after pos

    # => -1 otherwise 

Finding Strings - Quiz

    # For any variables s and t that are strings,
    # a variable i that is a number, 
    # which of the following is equivalent to
    s.find(t,i)

    s[i:].find(t)
    s.find(t)[:i]
    s[i:].find(t)+i
    s[i:].find(t[i:])
    # none of these

Hyper Text Markup Language (HTML)

HTML or HyperText Markup Language is the main markup language for creating web pages and other information that can be displayed in a web browser.

HTML is written in the form of HTML elements consisting of tags enclosed in angle brackets (like <html>), within the web page content.

HTML tags most commonly come in pairs like <h1> and </h1>, although some tags represent empty elements and so are unpaired, for example <img>.

The first tag in a pair is the start tag, and the second tag is the end tag (they are also called opening tags and closing tags). In between these tags web designers can add text, further tags, comments and other types of text-based content.

Hyper Text Markup Language (HTML)

The purpose of a web browser is to read HTML documents and compose them into visible or audible web pages. The browser does not display the HTML tags, but uses the tags to interpret the content of the page.

HTML elements form the building blocks of all websites.

HTML allows images and objects to be embedded and can be used to create interactive forms. It provides a means to create structured documents by denoting structural semantics for text such as headings, paragraphs, lists, links, quotes and other items.

It can embed scripts written in languages such as JavaScript which affect the behavior of HTML web pages.

Web browsers can also refer to Cascading Style Sheets (CSS) to define the appearance and layout of text and other material.

To see an example of a super simple webpage, click here.

Extracting Links - Programming Quiz

      # Write Python code that initializes the variable
      # start_link to be the value of the position
      # at which <a href= occurs for the first time in 
      # the string variable page

Extracting Links - Solution to Programming Quiz

      # Write Python code that initializes the variable
      # start_link to be the value of the position
      # at which <a href= occurs for the first time in 
      # the string variable page

      start_link = page.find('<a href=')

Extracting the first URL - Programming Quiz

      # Write Python code that assigns to the variable
      # url a string that is the value of the first
      # URL that appears in a link tag in
      # the string variable page

      start_link = page.find('<a href=')

      # add missing code

Extracting the first URL - Solution to Programming Quiz

      # Write Python code that assigns to the variable
      # url a string that is the value of the first
      # URL that appears in a link tag in
      # the string variable page

      start_link = page.find('<a href=')
      start_quote = page.find('"', start_link)
      end_quote = page.find('"', start_quote + 1)
      url = page[start_quote + 1 : end_quote]
      print(url)