Computer Science 15-112, Summer 2012
Class Notes:  Strings


  1. String Literals
    1. Escape Sequences
    2. Concatenated Literals
    3. Multi-Line String Literals
  2. String Constants
  3. String Operators
    1. String + and *
    2. String indexing and slicing
    3. The in operator
  4. Looping over Strings
    1. "for" loop with indexes
    2. "for" loop without indexes
    3. "for" loop with splitlines
    4. Example: isPalindrome
  5. Strings are Immutable
  6. String-related Built-In Functions
  7. String Methods
  8. String Formatting

Strings

  1. String Literals
    1. Escape Sequences
      print "Double-quote: \""
      print "Backslash: \\"
      print "Newline (in brackets): [\n]"
      print "Tab (in brackets): [\t]"
      print "These items are tab-delimited, 3-per-line:"
      print "abc\tdef\tg\nhi\tj\\\tk\n---"
       
    2. Concatenated Literals
      s = "abc" "def"  # ok
      print s
      s = s "def" # error (only works with string literals, not variables)
       
    3. Multi-Line String Literals
      s = """
      multi-line
      text!
      """
      print repr(s)  # prints '\nmulti-line\ntext!\n'


      s = """\
      Once again, but
      without extra newlines\
      """
      print repr(s)  # prints 'Once again, but\nwithout extra newlines'

       
  2. String Constants
    string.ascii_letters 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
    string.ascii_lowercase 'abcdefghijklmnopqrstuvwxyz'
    string.ascii_uppercase 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    string.digits '0123456789'
    string.hexdigits '0123456789abcdefABCDEF'
    string.letters See documentation for details.
    string.lowercase 'abcdefghijklmnopqrstuvwxyz' (on most systems)
    string.octdigits '01234567'
    string.punctuation '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
    string.printable digits + letters + punctuation + whitespace
    string.uppercase 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' (on most systems)
    string.whitespace space + tab + linefeed + return + formfeed + vertical tab (on most systems)

     

  3. String Operators
    1. String + and *
      print "abc" + "def"
      print "abc" * 3
      print "abc" + 3   # error

       
    2. String indexing and slicing
      print "abcdef"
      print "abcdefgh"[0]
      print "abcdefgh"[1]
      print "abcdefgh"[2]
      print "---------------"
      print "abcdefgh"[-1]
      print "abcdefgh"[-2]
      print "---------------"
      print "abcdefgh"[0:3]
      print "abcdefgh"[1:3]
      print "abcdefgh"[2:3]
      print "abcdefgh"[3:3]
      print "abcdefgh"[3:]
      print "abcdefgh"[:3]
      print "---------------"
      print "abcdefgh"[len("abcdefgh")-1]
      print "abcdefgh"[len("abcdefgh")]
      print "abcdefgh"[22]

       
    3. The in operator
      print "irate" in "Pirates"
      print "quit" in "Steelers"

       
  4. Looping over Strings
    1. "for" loop with indexes

      s = "abcd"
      for i in xrange(len(s)):
          print i, s[i]

       
    2. "for" loop without indexes

      s = "abcd"
      for c in s:
          print c
       
    3. "for" loop with splitlines

      s = """
      This is a sample
      multi-line
      string
      """
      print "Lines with splitlines():"
      for line in s.splitlines():
          print " line:", line

      print "Lines with splitLines(True):"
      for line in s.splitlines(True):
          print " line:", line
       
    4. Example: isPalindrome
      # There are many ways to write isPalindrome(s)
      # Here are several.  Which way is best?
      
      def isPalindrome1(s):
          reverse = ""
          for c in s:
              reverse = c + reverse
          return (reverse == s)
      
      def isPalindrome2(s):
          reverse = s[::-1]
          return (reverse == s)
      
      def isPalindrome3(s):
          return (s[::-1] == s)
      
      def isPalindrome4(s):
          for i in xrange(len(s)):
              if (s[i] != s[len(s)-1-i]):
                  return False
          return True
      
      def isPalindrome5(s):
          for i in xrange(len(s)):
              if (s[i] != s[-1-i]):
                  return False
          return True
      
      def isPalindrome6(s):
          while (len(s) > 1):
              if (s[0] != s[-1]):
                  return False
              s = s[1:-1]
          return True
  5. Strings are Immutable
    # You cannot change strings!  They are immutable.
    s = "abcde"
    s[2] = "z"              # Error! Cannot assign into s[i]
    
    # Instead, you must create a new string
    # But... This is inefficient!  (More on this next week, once we cover lists...)
    s = "abcde"
    s = s[:2] + "z" + s[3:]  # This is inefficient (if inside a tight loop), but at least it works
    print s                  # prints abzde
  6. String-related Built-In Functions
    bin, chr, eval, hex, len, oct, ord, raw_input, repr, reversed, str
    Most of these are clear, but here is how you use reversed:
    s = "abcd"
    for c in reversed(s):
        print c

     
  7. String Methods
    str.capitalize()
    str.center(width[, fillchar])
    str.count(sub[, start[, end]])
    str.decode([encoding[, errors]])
    str.encode([encoding[, errors]])

    str.endswith(suffix[, start[, end]])
    str.expandtabs([tabsize])
    str.find(sub[, start[, end]])
    str.format(*args, **kwargs)
    str.index(sub[, start[, end]])
    str.isalnum()
    str.isalpha()
    str.isdigit()
    str.islower()
    str.isspace()
    str.istitle()
    str.isupper()
    str.join(iterable)
    str.ljust(width[, fillchar])
    str.lower()
    str.lstrip([chars])
    str.partition(sep)
    str.replace(old, new[, count])
    str.rfind(sub[, start[, end]])
    str.rindex(sub[, start[, end]])
    str.rjust(width[, fillchar])
    str.rpartition(sep)
    str.rsplit([sep[, maxsplit]])
    str.rstrip([chars])
    str.split([sep[, maxsplit]])
    str.splitlines([keepends])
    str.startswith(prefix[, start[, end]])
    str.strip([chars])
    str.swapcase()
    str.title()
    str.translate(table[, deletechars])
    str.upper()
    str.zfill(width)
     
  8. String Formatting
    1. Some examples
      s = "The %s have won %d Super Bowls" % ("Steelers", 6)
      print s  # prints:  The Steelers have won 6 Super Bowls

      s = "The square root of %d is about %0.2f, give or take" % (5, 5**0.5)
      print s # prints: The square root of 5 is about 2.24, give or take
       
    2. Syntax:  format % values
      format is a string containing conversion specifiers as such:
      1. The '%' character, which marks the start of the specifier.
      2. Mapping key (optional), consisting of a parenthesised sequence of characters (for example, (somename)).
      3. Conversion flags (optional), which affect the result of some conversion types.
      4. Minimum field width (optional). If specified as an '*' (asterisk), the actual width is read from the next element of the tuple in values, and the object to convert comes after the minimum field width and optional precision.
      5. Precision (optional), given as a '.' (dot) followed by the precision. If specified as '*' (an asterisk), the actual width is read from the next element of the tuple in values, and the value to convert comes after the precision.
      6. Length modifier (optional).
      7. Conversion type.
         
    3. Conversion Flags
      Flag Meaning
      '#' The value conversion will use the “alternate form” (where defined below).
      '0' The conversion will be zero padded for numeric values.
      '-' The converted value is left adjusted (overrides the '0' conversion if both are given).
      ' ' (a space) A blank should be left before a positive number (or empty string) produced by a signed conversion.
      '+' A sign character ('+' or '-') will precede the conversion (overrides a “space” flag).

       

    4. Conversion Types
      Conversion Meaning
      'd' Signed integer decimal.
      'i' Signed integer decimal.
      'o' Signed octal value.
      'u' Obsolete type – it is identical to 'd'.
      'x' Signed hexadecimal (lowercase).
      'X' Signed hexadecimal (uppercase).
      'e' Floating point exponential format (lowercase).
      'E' Floating point exponential format (uppercase).
      'f' Floating point decimal format.
      'F' Floating point decimal format.
      'g' Floating point format. Uses lowercase exponential format if exponent is less than -4 or not less than precision, decimal format otherwise.
      'G' Floating point format. Uses uppercase exponential format if exponent is less than -4 or not less than precision, decimal format otherwise.
      'c' Single character (accepts integer or single character string).
      'r' String (converts any Python object using repr()).
      's' String (converts any Python object using str()).
      '%' No argument is converted, results in a '%' character in the result.

carpe diem   -   carpe diem   -   carpe diem   -   carpe diem   -   carpe diem   -   carpe diem   -   carpe diem   -   carpe diem   -   carpe diem