CSci 284 - Homework 3

Due: Thursday, Feb. 27th, 2025

This assignment is worth 50 points. For full credit:

Reading. mysql.connector Coding Examples

Part 1. Queries Using Inner Join  (15 points)

The purpose of this section is to get experience using INNER JOIN  queries. For each problem, write an SQL query that computes the requested information. All such queries will come from tables in the cs284_spring25 database. Use describe to familiarize yourself with the columns these tables contain.

When you turn this in, please include your code and the output (or just part of the output if the whole set is really long). Be sure to match the output shown, including column names.

  1. (3 points) Rewrite the following query using INNER JOIN:
    select count(*)
    from cities, states
    where cities.state_name = states.state_name;
  2. (5 points) Write a query that computes the percentage of a state's population contained in each city listed. Display the city name, state name, populations for both, and the percentage calculated in your result. Order by percentages in a way that makes sense to you. Rerun with a limit of 10 for turning in.
     
  3. (7 points) Write a query that finds all PAIRS of states that start with the same letter and have population within three million of each other. The LEFT function (one of the many String Functions) will be useful. Try this one using FROM/WHERE first, then rewrite as INNER JOIN.

Part 2. Loading data using Python (15 points)

In this section you'll complete a Python program to load a non-trivial amount of data in TSV format into a new table.

  1. Make a subdirectory called data in your cs284 directory (or wherever you run your MySQL commands)
     
  2. Download the file museums.tsv into your cs284/data directory
     
  3. From the MySQL command line, create a table in your database called museums that can hold all the info from the file:

    There are five columns in the file; the third one is blank for most of the museums as it refers to the college or university that the museum is associated with.
    Be sure to create an auto-incremented museum_id for each museum to serve as its primary key.
     
  4. Download the (partial!) Python program load-museums.py into your cs284 directory

  5. Modify the program such that it will insert the 33,072 museum records into your museums table. Do not use a LOAD DATA command.
     
  6. To turn this part in, post both the completed Python program and the results of doing select * from museums order by museum_name limit 20000,5 .

Part 3. CSV data (20 points)

We can load data from files containing CSV data using a combination of the ideas from the csv module described in Introduction to CSV Files with what you learned from the load-museums program. Create a table for the college information in data.csv and then write a Python program to read this file into the new table. Hint: use f-strings to insert each piece of data from each row into your SQL statement. 

Be sure the location and url columns are correct in your table. Also, do not add the column headers from the file to the table!

Deliverables