Software for CS 175, Fall 2022 
 
The links to software packages (all in Python) below will likely be useful to you both for 
the initial assignments and for class projects. For the class projects you are welcome (if you wish) 
to make use of other software packages in addition to those below, although the packages below contain a very large range of 
different library functions and utilities for text analysis and machine learning and should be enough to support most if not all aspects of your project.
 Anaconda Python Distribution 
We recommend that you download and install the 
free Anaconda Python distribution with Python 3.6 or above. Anaconda includes Python, the Natural
Language Toolkit (NLTK) and scikit-learn, in addition to 
a wide range of other packages that 
are useful for data analysis (such as matplotlib, numpy, scipy, and more). If you download Anaconda 
you should have many of the packages you will need for both the assignments and for your class project. 
Anaconda is available for Mac, Linux, and Windows OS. Anaconda includes (among many other libraries):
 
 Python (3.6 or above) 
You should have Python 3.6 or above installed on your computer for this course (if you 
installed Anaconda (see above) with the Python 3 option then you should already have it).  The  
online Python Tutorial materials are  very useful reference in general.
If you are not familiar with Python you will need to spend time learning it, e.g., via an online tutorial such as the
Beginner's Guide to Python or 
 an introductory text on Python such as 
Python Programming: An Introduction to Computer Science.  
 Pytorch and Related NLP Tools 
PyTorch is a powerful machine learning framework in Python that you should also download and install for this course. There are also a number of additional (optional) NLP packages that are built on top of PyTorch and that may be useful for your projects:
-   Huggingface, a very useful publicly-available set of models, datasets, library functions that extends PyTorch and TensorFlow, for example with multiple varieties of  transformer models such as BERT, DistilBERT, ALBERT, GPT-2, etc. 
 
-   TorchText (general purpose NLP package built on PyTorch)
-  PyTorch-NLP (a neural network NLP package built on PyTorch)
-  AllenNLP (advanced NLP capabilities from AI2 built on PyTorch)
 Python Virtual Environments 
When installing Python packages you may find it useful to use conda to 
create virtual environments that are specific to this course and/or your project, e.g., 
> conda create --name cs175 
> source activate cs175  
> conda install --name cs175 pytorch 
 
 
 Collaboration with Project Team Members 
If you have not used GitHub before to develop code as part of a collaboration, this project class would be an ideal opportunity to learn to use it.  There is lots of 
 online tutorial material on how to get started. Its probably helpful if at least one person on the team has used it before.