CSci 284 - Homework 2

Due: Tuesday, Feb. 17, 2026

This assignment is worth 50 points. For full credit:

Reading. Math and Aggregate Functions in SQL

Resource. SQL Examples at W3Schools

Part 1: Working with CSV Files (8 points).

The purpose of this section is to work with the programs we used to read and write CSV files using Python. I'll continue assigning small Python programs for review so when we look at larger Python programs later this semester, your Python skills aren't too rusty!

  1. Revisit the Introduction to CSV Files tutorial website. Two of the programs used the CSV file presidents.csv. If you haven't already, download this file into the same directory as your programs.

    Note: the author is not a computer professional or a CS professor. The tutorial is fine, but the comments in the example code are not always appropriate.
    Comments should always describe the program or the code block that follows the description.
     
  2. (3 points) The file presidents.csv was created in 2019, so it's missing the last two presidencies, and the death of Jimmy Carter in 2024. For this problem, you'll use a text editor to update the CSV file.
    Make sure your modified file is still a valid CSV file when saved.
    • The file is missing entries for Presidency #46 (Joe Biden) and Presidency #47 (Trump's second term). Add two rows containing all missing information for both presidents (all info is publically available, after all). You can of course copy most of Trump's information from the row for Presidency #45. Also update the left-office field for Presidency #45.
    • Former President Jimmy Carter recently died. Add his death-date and death-location information.
  3. (5 points) Copy the read_csv.py program from the tutorial to a new file formatted_read.py and modify it to do the following:

    • prompt the user for the CSV file name, so it will work with any properly-formatted CSV file, then
    • modify the way it writes the data so that it looks like the MySQL table format produced by the command SELECT * from tablename;
      For example:
    +-------+----------------------+-----------+
    | name | address | job |
    +-------+----------------------+-----------+
    | Alice | Gainesville, Florida | chef |
    | Bob   | Chicago, Illinois | writer |
    | Ted   | Miami, Florida | driver |
    | Carol | Portland, Oregon | executive |
    +-------+----------------------+-----------+

    Using Python's f-strings (formatted strings) will simplify producing the table format. Click the link for a description and some examples if you didn't study these in CSci 157. And no, it is not required that column headings be centered.

Part 2: SQL Queries  (20 points - 4 points each).

The purpose of this section is to get more experience using GROUP BY queries.

For each problem, write an SQL query that computes the requested information. All such queries will come from tables in the cs284_spring25 database. Remember to use describe to re-familiarize yourself with the fields in these tables if necessary. For each query, be sure to match the output shown, including column names, as much as possible. Note: the database content may have changed since these examples where run, so it is okay if your numbers are slightly different.

For your submission, queries and representative results can be cut and pasted into a text file, or you can take screenshots. By representative results, I mean just the first few lines, or the entire table if it isn't too long.

  1. Construct a query that generates this table:
    +-------------+-----------+-------------+
    | human_throw | bot_throw | num_matches |
    +-------------+-----------+-------------+
    | r           | r         |         245 |
    | r           | p         |         252 |
    | r           | s         |         288 |
    | p           | r         |         254 |
    | p           | p         |         255 |
    | p           | s         |         248 |
    | s           | r         |         220 |
    | s           | p         |         226 |
    | s           | s         |         260 |
    +-------------+-----------+-------------+

  2.  Modify the above so that it looks like this.  Hint: use an if statement to turn 'r' into 'rock' and so on.
  3. +-------------+-----------+-------------+
    | human_throw | bot_throw | num_matches |
    +-------------+-----------+-------------+
    | paper       | paper     |         255 |
    | paper       | rock      |         254 |
    | paper       | scissors  |         248 |
    | rock        | paper     |         252 |
    | rock        | rock      |         245 |
    | rock        | scissors  |         288 |
    | scissors    | paper     |         226 |
    | scissors    | rock      |         220 |
    | scissors    | scissors  |         260 |
    +-------------+-----------+-------------+

  4. Create a table that shows how often each human player threw each type of throw.
    Hint: start with the query SELECT player_1_id, if(player_1_throw='r',1,0) from rps_matches; and expand from there.
  5. +----------+------+-------+----------+
    | human    | rock | paper | scissors |
    +----------+------+-------+----------+
    | adejuoj0 |    7 |    24 |       20 |
    | bahao0   |   42 |    49 |       23 |
    | bellijh0 |   17 |    11 |        9 |
    | bertct0  |   19 |    15 |       16 |
    | bhandu0  |   29 |    29 |       33 |
    | carpeca0 |    9 |     8 |        9 |
    | darbnjr0 |   24 |    28 |       26 |
    | ddrinen  |    1 |     2 |        3 |
    | doylear0 |   43 |    36 |       27 |
    | hardibc0 |   18 |    20 |       17 |
    | jacksbc0 |    5 |     5 |       12 |
    | jakoba0  |   17 |    18 |       14 |
    | leek0    |  489 |   452 |      447 |
    | manacjs0 |   16 |    17 |       14 |
    | ricekg0  |   23 |     0 |        0 |
    | sulliwj0 |   26 |    43 |       36 |
    +----------+------+-------+----------+

  6. Use the DATE_FORMAT() function to re-do problem 2 from the in-class exercise so that it looks like this:
  7. +----------+-------------------------+-------------------------+
    | human    | first_match             | most_recent             |
    +----------+-------------------------+-------------------------+
    | adejuoj0 | Thu, Jan 18 at 8:09 PM  | Thu, Jan 18 at 8:11 PM  |
    | bahao0   | Thu, Jan 18 at 9:59 AM  | Fri, Jan 19 at 8:34 AM  |
    | bellijh0 | Fri, Jan 19 at 7:58 AM  | Fri, Jan 19 at 8:03 AM  |
    | bertct0  | Thu, Jan 18 at 1:44 PM  | Thu, Jan 18 at 1:46 PM  |
    | bhandu0  | Thu, Jan 18 at 11:51 AM | Thu, Jan 18 at 11:58 AM |
    | carpeca0 | Thu, Jan 18 at 12:24 PM | Thu, Jan 18 at 12:47 PM |
    | darbnjr0 | Thu, Jan 18 at 2:10 PM  | Thu, Jan 18 at 2:16 PM  |
    | ddrinen  | Thu, Jan 18 at 8:56 AM  | Thu, Jan 18 at 9:35 AM  |
    | doylear0 | Thu, Jan 18 at 6:31 PM  | Fri, Jan 19 at 8:34 AM  |
    | hardibc0 | Thu, Jan 18 at 3:42 PM  | Thu, Jan 18 at 3:45 PM  |
    | jacksbc0 | Fri, Jan 19 at 8:01 AM  | Fri, Jan 19 at 8:02 AM  |
    | jakoba0  | Thu, Jan 18 at 6:53 PM  | Thu, Jan 18 at 6:56 PM  |
    | leek0    | Thu, Jan 18 at 3:10 PM  | Thu, Jan 18 at 3:32 PM  |
    | manacjs0 | Thu, Jan 18 at 9:55 AM  | Thu, Jan 18 at 10:00 AM |
    | ricekg0  | Thu, Jan 18 at 1:33 PM  | Thu, Jan 18 at 1:34 PM  |
    | sulliwj0 | Fri, Jan 19 at 12:04 AM | Fri, Jan 19 at 8:34 AM  |
    +----------+-------------------------+-------------------------+

  8. In the cities table, there are several city names that appear in multiple states. For example, there is a Cleveland, OH and a Cleveland, TN. Springfield is also a popular city name.
    Write a query that determines how many city names appear in multiple states (the answer should be 62).

    Now, format your table so that it looks like the one below.  HINT: you'll want to use at least one GROUP_CONCAT, and also a CONCAT inside the GROUP_CONCAT 
    +--------------+--------------------------------------------------------------------------------------------+
    | city_name    | states_and_pops                                                                            |
    +--------------+--------------------------------------------------------------------------------------------+
    | Albany       | New York (98424), Georgia (76185), Oregon (51583)                                          |
    | Alexandria   | Virginia (148892), Louisiana (48426)                                                       |
    | Apple Valley | California (70924), Minnesota (50201)                                                      |
    | Auburn       | Washington (74860), Alabama (58582)                                                        |
    | Aurora       | Colorado (345803), Illinois (199963)                                                       |
    
    ...
    
    | Warren       | Michigan (134873), Ohio (40768)                                                            |
    | Westminster  | Colorado (110945), California (91739)                                                      |
    | Wilmington   | North Carolina (112067), Delaware (71525)                                                  |
    +--------------+--------------------------------------------------------------------------------------------+

Part 3: Subqueries (22 points)

  1. (3 points) Find the ids of all the posts that are longer than average (in terms of number of characters in the content). Hint: there will be 496 of them
  2. (3 points) Find the ids of all posts whose content is longer than the average content but whose title is shorter than the average title.  Hint: there are 263
  3. (6 points) Find the author_id numbers of all the authors who made at least five posts whose content is longer than average but whose title is shorter than the average.
  4. (10 points) Find the names of all the authors whose id number you found in the previous question. List them in alphabetical order by last name.

Deliverables