With ...
> SELECT id, color, name FROM shirts LIMIT 5
id color name
1 blue Jane
2 yellow Susan
3 green Craig
4 yellow Susan
5 blue Susan
... form a SQL statement which would find
- the number of shirts of each
color
- how many people (
names
) havecolor: blue
shirts - how many people (
names
) have more than one color of shirts
(examples)
Answers may vary but sufficient examples are ...
-
> SELECT color, count(id) FROM shirts GROUP BY color
-
> SELECT DISTINCT(name) FROM shirts WHERE color='blue'
-
> SELECT name FROM (SELECT name, COUNT(DISTINCT(color)) AS color_count FROM shirts GROUP BY name) WHERE color_count>1
... with an easy replication test via ...
$ pip install duckdb pandas
$ ipython
Python 3.11.2 (main, Feb 16 2023, 03:07:35) [Clang 14.0.0 (clang-1400.0.29.202)]
IPython 8.12.0 -- An enhanced Interactive Python. Type '?' for help.
# setup
In [1]: import duckdb
In [2]: import pandas as pd
In [3]: shirts = pd.DataFrame([{"id":1, "color":"blue", "name":"Jane"},{"id":2, "color":"yellow", "name":"Susan"},{"id":3, "color":"green", "name":"Craig"},{"id":4, "color":"yellow", "name":"Susan"},{"id":5, "color":"blue", "name": "Susan"}])
# loaded table
In [4]: print(duckdb.sql("SELECT * FROM shirts LIMIT 5").df())
id color name
0 1 blue Jane
1 2 yellow Susan
2 3 green Craig
3 4 yellow Susan
4 5 blue Susan
# answers
# π (1)
In [5]: print(duckdb.sql("SELECT color, count(id) FROM shirts GROUP BY color").df())
color count(id)
0 blue 2
1 yellow 2
2 green 1
# π (2)
In [12]: print(duckdb.sql("SELECT DISTINCT(name) FROM shirts WHERE color='blue'").df())
name
0 Jane
1 Susan
# π (3)
In [21]: print(duckdb.sql("SELECT name FROM (SELECT name, COUNT(DISTINCT(color)) AS color_count FROM shirts GROUP BY name) WHERE color_count>1").df())
name
0 Susan