ICS 184 / EECS 116 Summer 2006
Assignment #1
Due date and time: Wednesday, July 12, 11:59pm
Introduction
A database is simply defined as a collection of information relevant to some person, some program, or some organization. The collection can be as straightforward as a homogeneous list of employees and their job titles, or as complex as the complete set of interrelated information maintained by a bank about its customers, employees, assets, and investments.
In applications of a realistic scale, merely understanding the application's data requirements can be a challenge. The other half of the battle, turning these requirements into a software system that manages the data, is even more difficult if done from scratch. Fortunately, there are well-understood techniques for performing what's called data modeling, a conceptualization of what data needs to be stored, and how data relates to other data. Furthermore, there are existing software systems called database management systems that are adept at storing and manipulating data modeled using such techniques. The existence of data modeling techniques and database management systems greatly reduces the work required to build an application that requires a database.
There are multiple data modeling techniques in use, but by far the most prevalent is called the relational model, which we've been discussing in lecture. In the relational model, data is organized into relations with one or more attributes, with each relation being a set of tuples, and each tuple containing a unique grouping of values for the attributes. For example, a database for a hospital might contain a relation called patient, with attributes like patient_id, name, address, and admitted_date. Each tuple in the relation would contain information about a particular patient. Two tuples could have the same value for some of the attributes — such as two patients with the same name and/or address — but not for all of them.
Relational databases excel at managing the interrelationships between data in different relations. In our hospital example, a relation called visit might consist of attributes like patient_id and doctor_id, along with other information such as reason and visit_date. The presence of the patient_id and doctor_id attributes relates information about each instance of a patient visiting a doctor with identifying information about the patient and the doctor, without requiring that identifying information to be duplicated in the visit relation. Later this quarter, we'll explore issues, such as duplication of information, that arise in the design of a relational database.
For this assignment, I'd like you to get some practice formulating queries for a relational database that has already been designed. I'm asking you to use relational algebra to formulate your queries, so that you can focus on the task of selecting just the right information, joining together related data from multiple relations when necessary, without worrying about the syntactic and semantic details of programmatic query languages like SQL, which can sometimes be unintuitive. Subsequent assignments will encourage you to explore other parts of the broader task of data management, such as expressing and testing queries and data definitions programmatically using SQL, modeling data for the purposes of designing a database, and connecting to a database from another program.
The database
For this assignment, we'll work with a database underlying the operations of a hypothetical company that manages multiple automobile dealerships throughout the United States.
The following are the data requirements of our enterprise.
Keeping in mind the data requirements described above, the database has been designed to consist of the following relations, with the attributes making up the primary key of each relation underlined. Spend some time looking over these definitions, and make sure you understand why the primary key was chosen as it was. (For example, why is social_security_number not included in the primary key of sale?)
car(vin, make, model, year, mileage, asking_price, invoice_price)
dealership(dealership_id, dealership_name, dealership_street_address, dealership_city, dealership_state)
salesperson(salesperson_id, salesperson_name)
customer(social_security_number, customer_name, customer_street_address, customer_city, customer_state)
reports_to(salesperson_id, managing_salesperson_id)
works_at(salesperson_id, dealership_id, month_worked, base_salary_for_month)
inventory(vin, dealership_id)
sale(vin, social_security_number, salesperson_id, dealership_id, sale_price, sale_date)
The domain of the month_worked attribute of the works_at relation is unique months, such as March 2006 or September 2008.
The queries
Write a relational algebra expression for each of the following queries, which are intended to operate on a database designed as described in the previous section of this assignment. For each query, any legal relational algebra expression that yields the correct result is acceptable.
You are permitted to use the assignment operation to create temporary relations for the purposes of simplifying your queries, but it is not required.
Deliverables
Write all of your queries in a single file in a Microsoft Word (.doc), Rich Text Format (.rtf), or PDF document. Please be sure to use the appropriate Greek characters and mathematical symbols, rather than making up your own equivalents; this will help us greatly in grading the assignment, especially since there are multiple correct solutions to most of these problems. Submit the one file containing your queries and nothing else.
Follow this link for a discussion of how to submit your assignment. Be aware that I'll be holding you to all of the rules specified in that document, including the one that says that you're responsible for submitting the version of the assignment that you want graded. We won't regrade your assignment simply because you submitted the wrong version by accident.