STANFORD

Ling 235: Quantitative and Probabilistic Explanation in Linguistics
Handout #1: Winter 2005 General Information


Course Information

Lecture: 3-4 units, MW 3:15-4:45 Building 300, Room 303.

Useful Information and Handouts

Course Description

Overview:

This course will cover quantitative modeling and analysis of language.  Quantitative modeling and analysis is becoming increasingly important in both competence- and performance-based approaches to language, including formal linguistics, sociolinguistics, and psycholinguistics, because of its ability to explain noncategorical distributional patterns.  The course will cover literature in all three of these fields, focusing on issues including variation in word order, argument realization, phonological realization, and online sentence processing.  We'll be providing course participants with the tools to understand and formulate quantitative models that explain linguistic data, and then how to test them rigorously against empirical data.  Along the way, we'll provide tutorial material in statistical methods crucial for all sorts of linguistic work, including contingency tables, linear regression, generalized linear models/logistic regression, and stochastic Optimality Theory.

 

Organization:

Part of the class meeting time will be devoted to seminar-style coverage of “linguistics” – discussion-oriented sessions where we look closely at linguistics articles that use quantitative models – and part will be devoted to tutorial-style coverage of “statistics” – foundational and tutorial material covering the ideas and methods needed to do quantitative linguistic research.  For the “statistics” sessions, we will be using the SPSS statistics software package.  There will be small weekly assignments to help solidify understanding of the material we cover in the statistics sessions.

 

What this course will not cover:

The course will not cover any kind of data-gathering techniques.  We will assume that you have or can get some raw data to work with. If you think you need more skills in collecting data, you should consider another course that covers this, such as Ling 203 or Ling 155B.  You might also benefit from contacting Neal Snider, the corpus TA, who can help you with browsing and collecting data from the large variety of natural language corpora that are available at Stanford.  Take a look at the Linguistics corpora webpage for more information: http://www.stanford.edu/dept/linguistics/corpora/

Course Objective

For students to be able to understand, build, and test probabilistic models of linguistic phenomena.

Contact Info        

Christopher Manning

Roger Levy

Office: Gates Bldg. Room 158

Office: Gates Bldg. Room 114

Office Hours: Tue 3-4, Wed 2-3

Office Hours: Mon 10-11, Thurs 10-11

Phone: (650) 723-7683

Phone: (650) 725-6965

Fax: (650) 725-2588

Fax: (650) 723-5666

E-mail: manning@stanford.edu

Email: rog@stanford.edu

 

Prerequisites

  • A course in syntax
  • Some quantitative data, or the knowledge and means for collecting some

Intended Audience

Graduate students and advanced undergraduates specializing in linguistics or symbolic systems.

 

Reading and Work

Pre-reading

If you’d like to get a head start on things it’d be good to know about in this course, here are the recommendations:

  • If you know little about probability, then it’d be good to know more! One good place to start would be John Goldsmith’s tutorial, Probability for linguists available in Microsoft Word or converted to HTML.  Much the same material and some additional material is covered more concisely in chapter 2 of Christopher Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing. MIT Press, 1999. You can read the first few pages at Amazon.
  • If you’d like to get into the kind of material that will be dealt with in this course, here’s a survey chapter on Probabilistic Syntax by Chris, available in PDF or PS format.

Reading

There is no required text.  Readings and handouts will be distributed. Copies of materials available electronically will be posted here, and hard copies will be distributed in class.

Obtaining and using SPSS

You can use SPSS on a Windows, Macintosh, or Unix machine.  SPSS is pre-installed on one of the Windows machines and one of the Macintosh machines in the Linguistics department computer cluster.  A relatively old version is also pre-installed on most of the Sweet Hall SUN UNIX machines, including the elaines, the trees, and junior.  There are also available copies in the library: on Windows machines 1-6 in the Jonsson Reading Room, and on Macs in Meyer Library. However, we highly recommend that you obtain your own current copy for use on your own computer.  The best way to do this seems to be to license a full version of SPSS for a year through Stanford at the following website:

              http://www.stanford.edu/services/softwarelic/product_spss.html

It’s cheaper to buy yearly licenses in bulk, so please let us know soon if you want to get a yearly license and we’ll make a group order. (Note: the "student" edition of SPSS is too crippled to be useful; the "gradpack" is perfectly functional.)

Finally, you can download a free 14-day evaluation copy of SPSS for Windows through the SPSS website: http://www.spss.com

Work and Grading

The course work will be:

  • 6 assignments [written, may require use of computers/Web] (10% each)
  • Presentation of a paper or part of a paper (0%)
  • 1 final paper prospectus (0%)
  • 1 final project. This can be on a topic of your choice within the domain of the course. The form of the project is to design an assignment based around a dataset and quantitative analysis of it, and to provide a solution to the assignment  (40%)

The other expectation is reading papers and participation in class presentations.