yellowpigs.net

This is a document I wrote for a short Python tutorial at AdaCamp DC in July 2012. The intended audience was people (primarily women) with no previous programming experience.

What is Python?

The short version: A programming language that can be used to write short scripts to automate tasks, games, web apps, scientific computing applications, and many other things.

The longer version:

Do I need to install Python?

Possibly.

Mac OS X and Linux come with Python already installed. From a terminal window, you can type python, and you'll be in an interactive Python shell, known typically as the Python interpreter. (Terminal window? Where's that? In OS X you can open a terminal from something like Applications > Utilities. There are a number of ways to bring up a terminal window in Linux; see for example https://help.ubuntu.com/community/UsingTheTerminal/#Starting_a_Terminal).

For Windows, you'll need to download Python, which you can get from http://python.org/download. Version 2.7.3 is the first one listed, so go with that. The examples in this doc should work with 2.7.x (and I expect 2.6.x and some earlier versions), but not the 3.x versions because the syntax has changed slightly.

Set up instructions (and tutorials) are here: https://openhatch.org/wiki/Boston_Python_Workshop_6/Friday#Goal_.231:_set_up_Python (Thanks to Jessica McKellar!)

You can execute Python code directly in the interpreter, or by writing your program in a file, saving the file, and then executing the file from the command line. (For this tutorial, I'll assume you're using the interpreter, but you can use a file if you want. If you name your file "myprogram.py", you can execute it with python myprogram.py. The .py at the end of the file name is a Python convention.)

Some simple programs

Try giving the python interpreter the following statements. Type each of these lines separately at the '>>>' prompt, hitting Enter after each one. What happens after you hit Enter? What does each of the mathematical operators does?

2 + 2
10 - 3
3 * 4
2 * (2 + 3)
2 ** 5

Can you figure out what these operators do? Experiment more if needed.

5 / 3
5.0 / 3
5 % 3

Congratulations, you're now a programmer. Really. Well, a little bit. You gave the computer instructions in Python, and it executed them. Possibly you experimented and typed some variations -- experimentation is good; you won't break anything. Maybe you didn't type something exactly right and Python returned a cryptic-looking error; no worries, just try typing it again.

(If you did see an error message -- either an exception or a traceback -- that's Python's way of "helpfully" telling you in a very terse, precise way why it failed to understand your instruction. Getting errors is very common when programming, even for experienced programmers. Just as authors spend a lot of time deleting and editing, programmers spend a lot of time debugging and revising. Experienced programmers likely make fewer mistakes because they've learned from their previous mistakes, but mostly they still make mistakes and what they've gained from their experience is how to be better at finding and fixing these mistakes quickly.)

Variables

Variables are names that are associated with values.

Type this in the interpreter:

answer = 42

The Python interpreter doesn't display anything in response to your instruction, but it has done something. It has assigned the value 42 to the variable named answer. Now try these:

answer
answer == 42
answer == 43
answer != 42
answer != 43

(Not to be confused with the assignment operator = that we've just learned, the == operator checks for equality and returns a boolean value: True or False. What do you think != does?)

You can assign the value of one variable to another variable:

favoritenumber = answer
favoritenumber
favoritenumber == answer

You can use any of the other operators we've used with numbers with the variable answer as well:

answer + 2
anothernumber = answer + 3
anothernumber
anothernumber > answer

See if you can figure out what's going on here:

answer = answer + 2
answer
favoritenumber

There's also a more concise way to write this in Python (and some other languages):

answer += 2
answer

The variables we've seen all of values that are numbers, specifically integers. Python variables can have other values too, like strings (text), floats (floating point numbers, aka decimals), lists containing multiple values, and many others kinds of things. These "kinds of things" are called datatypes. Here's the relevant syntax for variable assignment of strings, floats, and lists:

question = 'What is the ultimate answer to life, the universe, and everything?'
pie = 3.14
morenumbers = [10, 20, 30, 40]

Hello world!

For some reason the "Hello world" program has become the standard first program. So here's "Hello world" in Python:

print 'Hello world!'

That's it!

You can also write:

print "Hello world!"

with double quotes. Python allows both single quotes or double quotes. Just make sure whichever one you use on the left matches the one you use on the right.

How about something a bit more complicated?

name = 'Sara'
print 'Hello' + name + '!'

What if you want to write an interactive program?

name = raw_input('What is your name? ')

You'll be prompted to enter something. Type a name and hit Enter.

print 'Hello ' + name + '!'

If you want to do more (with if)

Rather than always print the same greeting, you can also print something different depending on the name. So far we've seen that everything you type is executed in order, one line at a time. But you can also use control statements to control the flow of execution. The first control statement we'll see is the appropriately-named if. if can be used alone, or in conjunction with else and sometimes also elif (a contraction of "else if").

Rather than always print the same greeting, here's how to print something different depending on the name:

if name == 'Sara':
  print 'Hi ' + name + '!'

Note the colon at the end of the first line and the indentation of the second line. This syntax has a specific meaning to Python and are not optional.

Probably nothing happened. You'll need to hit Enter again to unindent before the interpreter does anything. Now something will happen only if you said your name was 'Sara'.

Here's an example using if, elif, and else:

# There are a lot of Sara(h)'s
if name == 'Sara':
  print 'Hi ' + name + '!'
elif name == 'Sarah':
  print 'Hey there ' + name + '!'
else:
  print 'Hello ' + name + '!'

Pay careful attention to the indenting above. It is not optional. I use two spaces to indent; some people use four spaces or a tab or some other amount. The important thing is to be consistent, so that in the above example, the if, elif, and else lines are all flushleft and the three print lines are all indented to the same column as each other. (I find it's easier to get the indenting right when writing code in a file rather than directly in the interpreter. Some text editors will even automatically indent for you.)

(The line that starts with # is a comment. The # is a comment character that tells Python to ignore that line, which means you can write a comment in English (or your preferred language) to explain what the code does (or why it does it, what you want it to do, or really anything). If you're reading someone else's code, look to the comments for clues. (Not all programming languages use # to signal comments. Some use // or /* ... */ or other characters.)

Try altering the above example to do something different. Be careful with the indenting.

Drawing a box

Python has many builtin functions (or builtins). For example, here's the range() function:

range(10)

range() is commonly used with the for ... in control statement. It's probably easiest to just see it in action:

for i in range(10):
  print i

or

for i in range(10):
  print i,

(What does that comma do?)

You can also nest for statements like so (careful with those indents):

for i in range(5):
  print i,
  for j in range(5):
    print j,
  print ' '

Look at the output until you think you understand what happened. What happens if you change one of the numbers in the above program?

When you're ready to move on, here's how to draw a starry box:

for i in range(5):
  for j in range(5):
    print '*',
  print ' '

We've used the number 5 twice in our code. This makes it a good candidate for refactorization. We can factor it out with a variable:

size = 5
for i in range(size):
  for j in range(size):
    print '*',
  print ' '

Now we can try this with different values of size. But still, retyping (or copy/pasting) this is kind of tedious. So we can define our own function called box (or whatever you want to call it), which takes one argument called size (or whatever you want to call it) as input:

def box(size):
  "Draw a box of stars with dimensions size by size."
  for i in range(size):
    for j in range(size):
      print '*',
    print ' '

and then call it like so:

box(5)
box(7)

(See the line in there that looks like English? That's a docstring -- a special kind of "comment-y" thing for Python functions. It's entirely optional, but if you use it, you're automatically populating standard Python help content for your code. If you end up writing code that others maintain, they'll likely really appreciate having help, so writing docstrings for all your functions is a great habit to have. And even if you're the only one reading your code, docstrings may still help you so you don't have to remember what your functions do.)

You can also define a function that takes two arguments (or three arguments, or no arguments):

def box(size, char):
  "Draw a box of character char with dimensions size by size."
  for i in range(size):
    for j in range(size):
      print char,
    print ''

You call it like so:

box(5, '#')

Actually, it turns out there's an easier way to print something repeatedly. Try this:

'#' * 5

So here's an alternate version of the function:

def box(n, char):
  "Draw a box of character char with dimensions size by size."
  for i in range(size):
    print char * size


box(5, '#')

It's very often the case that there are multiple ways to write code that produce the same output. Sometimes one way is faster than another, or easier to read than another, and sometimes there's really no sense in which one is "better" than another.

(One other note: The box function just prints things. Usually though, you'll see functions that return values. To be explicit, we could have ended the box function with the line return None.)

Functions are really common in Python, as they allow you to organize your code and put the nitty-gritty details in functions rather than in the main code block. Probably most of the code that you write will be in functions.

Fun with files

Here's how to do some stuff with files. First, we need a file. For these examples, I created a file called adacamp.wiki, which contains the wiki markup of the AdaCap page from the geekfeminism wiki.

You'll need to create this file (or some other file) now:

(I know I'm handwaving over this. Ask and someone will show you how to get this file and where to save it.)

Once you have the file, here's how we can read the contents of the file:

for line in open('adacamp.wiki'):
  print line,

And how to count the number of lines in the file:

count = 0
for line in open('adacamp.wiki'):
  count += 1
print "The file contains", count, "lines"

Try to understand why this works.

What if we want to find the wiki links in the file? Wiki links start with '[['. The following examples show what you can do with the in operator:

line = '*[[Imposter syndrome]]'
'[[' in line
line = 'Many of the sessions produced or updated Geek Feminism Wiki pages:'
'[[' in line

Putting this together with the way we read the contents of the file, and the if statement, we can build:

for line in open('adacamp.wiki'):
  if '[[' in line:
    print line

(It turns out that link extraction is pretty hard, and this doesn't give us quite what we want. One way to get just the links and not the surrounding line is to import the regular expression module and use it. Regular expressions are pretty hard to read, so I won't cover them here. But it's worth noting that sometimes when you are trying to write something, someone may have already written a module containing chunks of the code that are useful to you.)

Determining the frequency of words from a file

Warning: This part is much harder, and I'm going really fast here. I just want to give you a sense of the kinds of practical things you can accomplish with Python. If you think you've had enough for now or want a more gentle introduction, this would be a good place to stop.

How do we go about determining the frequency of words in a file? First we need to read in the lines of a file, then split the lines into words, then keep track of how many times we've seen each word. Python has a datatype, called a dictionary, that's just the right type of thing to keep track of this. A dictionary associates keys (in this case words), with values (in this case the number of times each word has been seen). We'll see a dictionary that we've named freq here, and also some new functions (really, methods) that strip the newlines off the end of lines and that split text on word boundaries.

for line in open('adacamp.wiki'):
  print line.strip().split()
freq = {}
for line in open('adacamp.wiki'):
  words = line.strip().split()
  for word in words:
    if word in freq:
      freq[word] += 1
    else:
      freq[word] = 1
print freq

This prints the results in no particular order. There's some specific Python syntax to sort a dictionary. You can think of this as an idiom if you like. And if you don't remember, this is the kind of thing that you can look up with your favorite search engine ("sort python dictionary") or find on a coding website like http://www.stackoverflow.com. Here's how to print the results in order from most to least frequent word:

for item in sorted(freq, key=freq.get, reverse=True):
  print item, freq[item]