Welcome to Programming for the Humanities
First, let us introduce ourselves! Click on the grey box below this one and press RUN at the top of your screen.
This is a Jupyter Notebook. It allows you to practice and prototype your code in a cloud environment (rather than directly on your computer) and to document your work through text boxes above and below each section of code.
The grey boxes below are all Python code boxes. If you type your code within them, you can run each one in turn. It is important to remember that although this is one big notebook, you must run each box separately (and in order) for the lesson to make sense.
If at any point the In [1] next to your grey box becomes stuck as In [*], hit the refresh (circular arrow) button at the top of the screen and begin again.
So, let us introduce ourselves and practice using the notebook! Click on the grey box below this one and then press the Run button at the top of your screen. When the questions appear, answer each one in turn.
name = input('What is your name? ')
discipline = input('What discipline do you work in? ')
department = input('Which department do you belong to? ')
rank = input('Are you a researcher, postgraduate or undegraduate student? ')
print ("\nHello, " + name + "!\n"
+ "How is your reserch in " + discipline
+ " going over in the " + department + " department?\n"
+ "It's great to have so many " + rank + "s here with us today!")
Wasn't that fun? Now let's start with variable types!
The four answers you gave me earlier are known as strings or string literals. In Python, these are always enclosed in quotations marks (either ' ' or " "). They can include any character—a letter, a number or a symbol! There are many other types of variables you can use. The most common are:
- str (strings of letters, numbers and other characters)
- bool (true or false, usually written as 0 and 1 or True and False; notice the capital letters!)
- int (signed integers, such as 1 and -1)
- long (long integers, for really, really big numbers!)
- float (floating point real values, such as 3.14)
- complex (complex numbers)
Normally, Python will guess which type of variable you are creating based on the contents. If you wanted to create a string variable called name with the value "John", you could just type
name = "John"
For numbers, you should not use quotation marks. For example
myint = 1 mybool = False myfloat = 3.14
Whatever you name your variables, make sure there are no spaces of special characters. If you want you can use _ as a space, as above.
Let's try a few now! In the box below, create 3 variables—a string, a bool and an integer—each on a separate line within the box. When you are done, run the code by hitting the run button or (even easier) hitting cntl+enter on your keyboard. This will store your variables in memory.
How did you do? You don't know? Well, let's ask the console to display what you wrote. In the box below, tell Python to display one of your variable by using the print command.
print (whatever_you_called_your_variable)
and then hitting run. Notice there is no =
The next thing we are going to learn about are lists.
Lists are containers that hold several variables together. The variables can all be the same type, or can be mixed.
To make a list, use the format
mylist = [variableName,variableName,variableName]
Notice, I used square brackets, rather than normal or curly ones.
Why don't you try making a list of your three new variables below?
Did you try printing them to the console? Go ahead!
Now that we have some data, what can we do with it?
Once you have stored data in your variables, you can run operations on it to transform it into different data. This is true of both numeric and alphabetic data!
Time for some good old-fashioned arithmetic. Remember that integer variable you created? Let's do some magic on that.
In the box below, on the first line
- create a new variable and assign it (set it equal to) your original integer variable
+ 2
On the second line
- print your new integer variable
Bravo! You can do any basic operation on a integer using the +
, -
, *
, and /
operators.
You can also find the remainder of a quotient using && and raise a number to a specific power using **
For example:
- 20%%3 will give you 2, because the 20 divided by 3 is 18, remainder 2.
- 20**3 will give you 8000, because 20 to the power of 3 is 8000.
You don't even need to assign the value to a variable! You can use the Jupyter notebook just like a calculator by entering the operation and running the cell. Try a few below.
You can also use operators on strings, though not in exactly the same way. Try creating a new string variable that adds to string variables together below, then printing it to the console.
What happened? Did you end up with two words smooshed together? That's okay. Try again, and this time add + " " +
between them. What happens if you try to subtract one from the other? Try below:
Can't fool you. You can't subtract strings, but you can run a number of different functions on them. We'll talk more about these later.
Many of the operations we will undertake today can be done more efficiently with freely available libraries. However, it is important to understand the underlying processes of these libraries (what they are doing under the hood) so that you can process your text exactly the way you intend to.
We will be using these variables in today's examples, so review the box below and then run it to assign these four variables.
my_string = "The time has come for all good men to come to the aid of their country"
my_name = "Melodee Beals"
my_bool = False
my_int = 7
my_float = 3.14
my_list = [my_name, my_bool, my_int, my_float]
print (my_list)
['Melodee Beals', False, 7, 3.14]
In order to retrieve specific information from our variables, such as a particular word or phrase, it is useful to know exactly how long our strings are. To calculate this, use the len()
function.
Len, short for length, will return (give you) the number of characters in your string, or the number of items in your list. Simply place the variable name within the brackets ().
In the box below, calculate the length of my_name
.
Note: you do not need to print, simple use the len()
function on its own
Now, think of your data as a machine-readable form, one of those forms that has a little box for each letter or number in your response. Like this:
|M|E|L|O|D|E|E| |B|E|A|L|S|
Suppose you just want the first and ninth letter of this string, my initials. To do this, you can tell the console to only select those specific letters, in this case 0 and 8—in Python, and most programming languages, you start counting with 0 rather than 1.
You can create a new variable with only these letters using the variable name, followed by square brackets and the number.
initials = my_name[0]+my_name[8]
Give it a try with your own name! In the box below, reassign my_name
(just as you would with a new variable) and then print your initials.
If you want several consecutive characters, you can use
variableName[a:b]
where a
is the first character and b
is the number of the character BEYOND the last character you want—think of it as the first character of the part that is snipped off!
You can also use negative numbers in order to count from the end of the string, with -1 being the final character.
You can also leave the first character blank to indicate "from the begining" and the second black to indicate "until the end".
So, the last 5 letters would be my_name[-5:]
Try out some variations below! For example:
- What is the first character?
- What are the first three characters?
- What are the 5 through the 9 characters?
- What are the last 4 characters?
Now we are going to start working with longer strings. In this case, a single sentence my_string
.
Using what you have learned above, experiment with pulling out extracts of the sentence. Using my_string
trying printing the third word.
Working with sentences and longer texts on a character basis is sometimes useful, but other times you want to work on a word-by-word basis instead. To do this, we need to tokenize or cut up our string into separate pieces.
We can do this by transforming our string into a list, with each word (minus the whitespace) serving as an item in that list.
Create a new variable and assign it by using the "split method" or function on your string. The format is
my_words = my_string.split()
It is very important that you remember to include the brackets () at the end. This space is used if you wish to split your text by something other than whitespace.
When you are done, use the len()
function in order to count the number of words in your list.
Use the print function in order to display your list
Now that our sentence has been tokenized, it is much easier to extract a sublist of words. Create a new list and assign it to a sub-section of the sentence, a few words somewhere in the middle, and then print it.
Well done. You've split the sentence into individual words and selected a small sample. If you want to reverse the process, turning the list back into a single string, you'll need to use the .join()
function. Assign a new variable to the following format:
'x'.join(y)
where x
is the character you wish to use as a joiner (usually a space or underscore) and y
is the name of your sublist. When you are done, print your new variable.
Using ''.join()
is a good way of combining strings (whether in lists or not), but you can also use something called a for loop.
A loop is a way of doing the same process over and over again. Below we'll explore two types of loops, a for loop and a while loop.
A for loop asks the compiler to do a task for each item in a group. In this case, for each word in our word list. The format of the for loop is as follows
for x in y:
do something
Where x is a placeholder (it creates a new, temporary variable) and y is the container. For example: for word in word_list
or for character in my_string
On the second line you'll need to indent (tab) and then give an instruction, such as print()
, len()
or ''.join
.
In the box below, write a for loop to print each word in your (full) word list.
Notice how each word is its own line? This is because you are essentially writing a print command on its own line, over and over again. We can override this by using the parameter end=''
, which replaces the new line character with a different one. The format is as follows
print(x, end='y')
were x is the variable name and y is the character replacing the line break. Try it below:
You can use for loops to examine and 'test' every word in your list. To do this, you use the format if x: y
where x is a test statement and y is a command. For example:
if my_name == "Melodee Beals":
print (my_name)
You can also ask more complicated questions. Maybe you just want to know if the name starts with "M". To do this, you can use the .startswith()
function as follows
if my_name.startswith('M'):
Using this information, create a for loop that goes through all the words in your list, and only prints those that start with "t". Choose whether you want the results ot be on one line or separate ones.
You can make this a little more pleasant to look at by adding the correct punctuation—its a little complicated to do this manually but it will teach you a lot!
- Create a new empty list to hold your selection of "t" words, such as
newlist
- Create a for loop that goes through your longer word list
- and if the word starts with a 't'
- append it to your new sublist using the command
newlist.append(word)
- append it to your new sublist using the command
- and if the word starts with a 't'
- print your new list using the
join()
function, with a joiner of ', ' and an end parameter of "."
How did you do? Need a little help?
word_sub_list = []
for word in my_words:
if word.startswith('t'):
word_sub_list.append(word)
print (', '.join(word_sub_list), end ='.')
Sometimes you don't want to go through an entire container, or you want to have more precision. For this, you can use while loops—while something is true, do something.
There are many ways to do this, but we'll start with the simplest, a counter.
i = 0
while i < 10:
print(i)
i = i + 1
This wil go through all the numbers, 0 through 9, and print each in turn. After it prints the number, it will increase i
by one. When it gets to ten, the condition i < 10
will no longer be true, so it will stop. Give it a try below, but instead of printing i, be creative. Maybe add 'i' to a running total or print your name i
times?
For the last part of today's workshop, we are going to create our own, very basic concordance programme, using everything we've learned.
To write you programme you'll need to remember how to
- make lists
- make a while loop
- use and if statement
- use .startswith (to identify specific words)
- use index numbers (to extract a specific word from your list)
- append new items to a list
- print a list
Make a programme that
- goes through your list of words looking for the word
to
- extracts that word and the word before and after it
- creates a new string of that three-word phrase
- adds that string to a sub-list of words
- prints your list of new strings
The list above is a big hint!
Really try, but if you get stuck
ngram_list = []
i = 0
while i < len(my_words):
if my_words[i].startswith('to'):
ngram = ' '.join(my_words[(i-1):(i + 2)])
ngram_list.append(ngram)
i = i + 1
for ngram in ngram_list:
print (ngram)