Software for CS 175, Fall 2022
The links to software packages (all in Python) below will likely be useful to you both for
the initial assignments and for class projects. For the class projects you are welcome (if you wish)
to make use of other software packages in addition to those below, although the packages below contain a very large range of
different library functions and utilities for text analysis and machine learning and should be enough to support most if not all aspects of your project.
Anaconda Python Distribution
We recommend that you download and install the
free Anaconda Python distribution with Python 3.6 or above. Anaconda includes Python, the Natural
Language Toolkit (NLTK) and scikit-learn, in addition to
a wide range of other packages that
are useful for data analysis (such as matplotlib, numpy, scipy, and more). If you download Anaconda
you should have many of the packages you will need for both the assignments and for your class project.
Anaconda is available for Mac, Linux, and Windows OS. Anaconda includes (among many other libraries):
Python (3.6 or above)
You should have Python 3.6 or above installed on your computer for this course (if you
installed Anaconda (see above) with the Python 3 option then you should already have it). The
online Python Tutorial materials are very useful reference in general.
If you are not familiar with Python you will need to spend time learning it, e.g., via an online tutorial such as the
Beginner's Guide to Python or
an introductory text on Python such as
Python Programming: An Introduction to Computer Science.
Pytorch and Related NLP Tools
PyTorch is a powerful machine learning framework in Python that you should also download and install for this course. There are also a number of additional (optional) NLP packages that are built on top of PyTorch and that may be useful for your projects:
- Huggingface, a very useful publicly-available set of models, datasets, library functions that extends PyTorch and TensorFlow, for example with multiple varieties of transformer models such as BERT, DistilBERT, ALBERT, GPT-2, etc.
- TorchText (general purpose NLP package built on PyTorch)
- PyTorch-NLP (a neural network NLP package built on PyTorch)
- AllenNLP (advanced NLP capabilities from AI2 built on PyTorch)
Python Virtual Environments
When installing Python packages you may find it useful to use conda to
create virtual environments that are specific to this course and/or your project, e.g.,
> conda create --name cs175
> source activate cs175
> conda install --name cs175 pytorch
Collaboration with Project Team Members
If you have not used GitHub before to develop code as part of a collaboration, this project class would be an ideal opportunity to learn to use it. There is lots of
online tutorial material on how to get started. Its probably helpful if at least one person on the team has used it before.