CSci 284 - Homework 3
Due: Thursday, Feb.
27th, 2025
This assignment is worth 50 points. For full credit:
- submit responses to BrightSpace by 11:59pm on the due date
- see the Deliverables section below
Reading. mysql.connector Coding Examples
Part 1. Queries Using Inner Join (15 points)
The purpose of this section is to get experience using INNER JOIN
queries. For each problem, write an SQL query that computes the
requested information. All such queries will come from tables in the cs284_spring25
database. Use describe
to
familiarize yourself with the columns these tables contain.
When you turn this in, please include your code and the output (or just
part of the output if the whole set is really long). Be sure to match
the output shown, including column names.
- (3 points) Rewrite the following query using INNER JOIN:
select count(*) from cities, states where cities.state_name = states.state_name;
- (5 points) Write a query that computes the
percentage of a state's population contained in each city listed.
Display the city name, state name, populations for both, and the
percentage calculated in your result. Order by percentages in a way
that makes sense to you. Rerun with a limit of 10 for turning in.
- (7 points) Write a query that finds all PAIRS of states that start with the same letter and have population within three million of each other. The LEFT function (one of the many String Functions) will be useful. Try this one using FROM/WHERE first, then rewrite as INNER JOIN.
Part 2. Loading data using Python (15 points)
In this section you'll complete a Python program to load a non-trivial amount of data in TSV format into a new table.
- Make a subdirectory called data in your cs284
directory (or wherever you run your MySQL commands)
-
Download
the file museums.tsv into your cs284/data
directory
- From
the MySQL command line, create a table in your database called
museums
that can hold all the info from the file:
There are five columns in the file; the third one is blank for most of the museums as it refers to the college or university that the museum is associated with.
Be sure to create an auto-incrementedmuseum_id
for each museum to serve as its primary key.
- Download
the (partial!) Python program load-museums.py
into
your cs284 directory
- Modify
the program such that it will insert the 33,072 museum records
into your museums table. Do not use a LOAD DATA command.
- To turn this part in, post both the completed Python program and the results of doing select * from museums order by museum_name limit 20000,5 .
Part 3. CSV data (20 points)
We can load data from files containing CSV data using a combination of the ideas from the csv module described in Introduction to CSV Files with what you learned from the load-museums program. Create a table for the college information in data.csv and then write a Python program to read this file into the new table. Hint: use f-strings to insert each piece of data from each row into your SQL statement.
Be sure the location
and url
columns are correct in your table. Also, do not add the column headers
from the file to the table!
Deliverables
- Include the Problem number in the filename of each screenshot.
- Create a zip file containing all screenshots for each problem and post to the Assignments page on BrightSpace.