CSci 284 - Homework 3
Due: Thursday, Feb.
26, 2026
This assignment is worth 50 points. For full credit:
- submit responses to BrightSpace by 11:59pm on the due date
- all Python programs need a mulit-line file comment at the top (use triple quotes); include the program name, assignment number, and of course your own name
- see the Deliverables section below
Reading. mysql.connector Coding Example
Part 1. Queries Using Inner Join (15 points)
The purpose of this section is to get experience using INNER JOIN
queries. For each problem, write an SQL query that computes the
requested information. All such queries will come from tables in the cs284_spring26
database. Use DESCRIBE to familiarize
yourself with these tables.
When you turn this in, please include your code and the output (or just
part of the output if the whole set is really long). Be sure to match
the output shown, including column names.
- (3 points) Rewrite the following query using INNER JOIN:
select count(*) from cities, states where cities.state_name = states.state_name; - (5 points) Write a query that computes the percentage of a state's
population contained in each city listed. Display the city name, state
name, populations for both, and the percentage calculated in your
result. Order by percentages in a way that makes sense to you. Rerun
with a limit of 10 for turning in.
- (7 points) Write a query that finds all PAIRS of states that start with the same letter and have population within three million of each other. The LEFT function (one of the many String Functions) will be useful. Try this one using FROM/WHERE first, then rewrite as INNER JOIN.Take a screenshot of both attempts and their output.
Part 2. Loading data using Python (21 points)
In this section you'll complete a Python program to load a non-trivial amount of data in TSV format into a new table.
- On your Linux account, make a new subdirectory called data
in your cs284 directory (or wherever you run your MySQL
commands)
- Download
the file museums.tsv into your cs284/data
directory
- Run
the MySQL client. From the command line, create a table in your
database called
museumsthat can hold all the info from the file:
There are five columns in the file; the third one is blank for most of the museums as it refers to the college or university that the museum is associated with.
Be sure to create an auto-incrementedmuseum_idfor each museum to serve as its primary key.
Screenshot the results of callingDESCRIBEon your new table.
- Download
the (partial!) Python program load-museums.py
into
your cs284 directory
- Modify
the program such that it will insert the 33,072 museum records
into your museums table. Do not use a LOAD DATA command. After
insertion is complete, print the final count.
- For
full credit, submit both the completed Python program and a
screenshot showing the results of these queries:
- SELECT
museum_name, count(*) FROM museums;
- SELECT * FROM museums ORDER BY museum_name LIMIT 20000,5;
- SELECT museum_name, university FROM museums WHERE city = 'SAN ANTONIO';
This last LIMIT specifies the offset first, then the number of rows to display.
- SELECT
museum_name, count(*) FROM museums;
Part 3. CSV data (14 points)
We can load data from files containing CSV data using a combination of the ideas from the csv module described in Introduction to CSV Files with what you learned from the load-museums program. Create a table for the college information in data.csv and then write a Python program to read this file into the new table. Hint: use f-strings to insert each piece of data from each row into your SQL statement.
Be sure the location and url
columns are correct in your table. Also, do not add the column headers
from the file to the table!
Make a screenshot
of the results of calling DESCRIBE on your new
table and a SELECT
* query.
Deliverables
- Include the Problem number in the filename of each screenshot.
- Create a zip file containing the screenshots and Python files for each problem and post to the Assignments page on BrightSpace.