Coroutines
Coroutines are, like generators, a very unique and interesting feature of Python. They enable us to implement an algorithm, using a nice, clean, function like defintion -- but to pause the execution of the algorithm for input.
Generators paused after generating output, waiting for the caller to request more. Coroutines pause execution waiting for the caller to send input. In Coroutines, the "yield" represents a value and will most often appear on the right side of an equal sign, or as the argument to a function -- by contrast, in generators, "yield" is a command, much like "return".
Coroutines are probably best explained by example. Consider the following simple example, in which the coroutine waits for a vlaue, and then prints it. We define the coroutine, create one and assign it to a variable, advance it to the "yield", and then send it values, one at a time:
def printValue(): while True: value = (yield) print value pv = printValue() # Create an assign pv.next() # Advance until it blocks at the first "(yield)" pv.send("This string is sent and becomes the value of the (yield)") pv.send("And again...") pv.send("And again...") # We can also do it in a loop sentence = "The quick brown fox jumps over the lazy old dog." for word in sentence.split(): pv.send(word)
Much as we could with generators, we can also organize coroutines into a pipeline. But, there is an important, be it somewhat subtle, difference. When pipelining generators, the data falls down through the pipeline of generators. When we pipeline coroutines, the data is pushed upward. Compare the example below to the generator example in the prior lecture.
def printWord(): while True: word = (yield) print word def numberWord(targetCR): number = 0 while True: word = (yield) targetCR.send(str(number) + ": " + word) number += 1 def upperWord(targetCR): while True: word = (yield) targetCR.send(word.upper()) def wordNoPeriods(targetCR): while True: word = (yield) targetCR.send(word.replace(".","")) # I like to use the ;-semicolon to intialize and advance on the same line pw = printWord(); pw.next() nw = numberWord(pw); nw.next() uw = upperWord(nw); uw.next() wnp = wordNoPeriods(uw); wnp.next() sentence = "The quick brown fox jumps over the lazy old dog." for word in sentence.split(): wnp.send(word)
Regular Expressions
Today we chatted a bit about the relationship between regular expressions, regular languages, and finite state machines (FSMs). You'll get that discussion with a lot more rigor in 15-251, so I don't want to emphasize it here. It is often said that, to teach, you should, "Tell them what you are going to tell them. Tell them. And then tell them what you've told them." Think of it this way: We just did step #1 -- we'll leave steps #2 and #3 for 15-251.We then discussed how to use regular expressions in Python. The resource you want as a reference is the Python Regular Expression HOWTO. It is excellent. We emphasized the following:
- import re
- re.compile("pattern")
- re.compile("pattern", re.IGNORECASE)
- match() vs search()
- group()/group(0), group(N)
- start(), end(), span()
- findall() vs finditer()
- capturing and positional registers
- ?: for non-capturing group
One interesting example we did in class involved the need to escape the \-slash when using it as a positional, and the need to use ?: to avoid capturing a group:
#!/usr/bin/python import re text = "01/01/2013 some other text 09/09/2013" # Let's find all dates p = re.compile("[0-9]+/[0-9]+/(19|20)[0-9]{2}") matches = p.finditer(text) for match in matches: print match.group() print "" print "Capturing" print "" # Two (2) Things to notice below: # 1. The ?: causes us to use the () to form a group, like () in math, # but not to capture them into a group saved as a group() # 2. \1 represents the first captured group (vs a ?: non-captured group). # Notice that we had to escape it as "\\" to prevent Python # From viewing it as an "escaped 1" and sending that (whatever it is?) # as part of the string instead fo the \-slash to the compile function. text = "01/01/2013 some other text 09/09/2013 and the date again: 01/01/2013" p = re.compile("([0-9]+/[0-9]+/(?:19|20)[0-9]{2}).*(\\1)") matches = p.finditer(text) print "Notice that we only print the repeated date" for match in matches: print match.group(1), match.group(2)
Last year's TAs produced this handout [pdf] that may be helpful to you as a quick summry of the regular expression language.