Last active
July 22, 2020 18:29
-
-
Save quasiben/f410e7b3e0467c8569d8eaf2dee43d25 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Query 01 | |
Find top 100 products that are sold together frequently in given stores. Only products in certain categories | |
sold in specific stores are considered, and "sold together frequently" means at least 50 customers bought | |
these productstogether in a transaction. | |
Query 02 | |
Find the top 30 products that are mostly viewed together with a given product in online store. Note that | |
the order of products viewed does not matter, and "viewed together" relates to a web_clickstreams | |
click_session of a known user with a session timeout of 60min.If the duration between two click of a user is | |
greater then the session timeout, a new session begins. With a session timeout of 60min. | |
Query 03 | |
For a given product get a top 30 list sorted by number of views in descending order of the last 5 products | |
that are mostly viewed before the product was purchased online. For the viewed products, consider only | |
products in certain item categories and viewed within 10days before the purchase date. | |
Query 04 | |
Web_clickstream shopping cart abandonment analysis: For users who added products in their shopping | |
carts but did not check out in the online store during their session, find the average number of pages they | |
visited during their sessions. A "session" relates to a click_session of a known user with a session time-out | |
of 60min.If the duration between two clicks of a user is greater then the session time-out, a new session | |
begins. | |
Query 05 | |
Build a model using logistic regression for a visitor to an online store: based on existing users online | |
activities (interest in items of different categories) and demographics. This model will be used to predict if | |
the visitor is interested in a given item category. Output the precision, accuracy and confusion matrix of | |
model. | |
Note: no need to actually classify existing users, as it will be later used to predict interests of unknown | |
visitors. | |
Query 06 | |
Identifies customers shifting their purchase habit from store to web sales. Find customers who spend in | |
relation more money in the second year following a given year in the web_sales channel then in the store | |
sales channel. Report customers details: first name, last name, their country of origin, login name and email | |
address) and identify if they are preferred customer, for the top 100 customers with the highest increase in | |
their second year web purchase ratio. | |
Query 07 | |
List top 10 states in descending order with at least 10 customers who during a given month bought | |
products with the price tag at least 20% higher than the average price of products in the same category. | |
Query 08 | |
For online sales, compare the total sales monetary amount in which customers checked online reviews | |
before making the purchase and that of sales in which customers did not read reviews. Consider only | |
online sales for a specific category in a given year. | |
Query 09 | |
Aggregate total amount of sold items over different given types of combinations of customers based on | |
selected groups of marital status, education status, sales price and different combinations of state and sales | |
profit. | |
Query 10 | |
For all products, extract sentences from its product reviews that contain positive or negative sentiment and | |
display for each item the sentiment polarity of the extracted sentences (POS OR NEG) and the sentence and | |
word in sentence leading to this classification. | |
Query 11 | |
For a given product, measure the correlation of sentiments, including the number of reviews and average | |
review ratings, on product monthly revenues within a given time frame. | |
Query 12 | |
Find all customers who viewed items of a given category on the web in a given month and year that was | |
followed by an instore purchase of an item from the same category in the three consecutive months. | |
Query 13 | |
Display customers with both store and web sales in consecutive years for whom the increase in web sales | |
exceeds the increase in store sales for a specified year. | |
Query 14 | |
What is the ratio between the number of items sold over the internet in the morning (7 to 8am) to the | |
number of items sold in the evening (7 to 8pm) of customers with a specified number of dependents. | |
Consider onlywebsites with a high amount of content. | |
Query 15 | |
Find the categories with flat or declining sales for in store purchases during a given year for a given store. | |
Query 16 | |
Compute the impact of an item price change on the store sales by computing the total sales for items in a | |
30 day period before and after the price change. Group the items by location of warehouse where they | |
were delivered from. | |
Query 17 | |
Find the ratio of items sold with and without promotions in a given month and year. Only items in certain | |
categories sold to customers living in a specific time zone are considered. | |
Query 18 | |
Identify the stores with flat or declining sales in 4 consecutive months, check if there are any negative | |
reviews regarding these stores available online. | |
Query 19 | |
Retrieve the items with the highest number of returns where the number of returns was approximately | |
equivalent across all store and web channels (within a tolerance of +/ 10%), within the week ending given | |
dates. Analyse the online reviews for these items to see if there are any negative reviews. | |
Query 20 | |
Customer segmentation for return analysis: Customers are separated along the following dimensions: | |
return frequency, return order ratio (total number of orders partially or fully returned versus the total | |
number of orders), return item ratio (total number of items returned versus the number of items | |
purchased), return amount ration (total monetary amount of items returned versus the amount purchased), | |
return order ratio. Consider the store returns during a given year for the computation. | |
Query 21 | |
Get all items that were sold in stores in a given month and year and which were returned in the next 6 | |
months and repurchased by the returning customer afterwards through the web sales channel in the | |
following three years. For those items, compute the total quantity sold through the store, the quantity | |
returned and the quantity purchased through the web. Group this information by item and store. | |
Query 22 | |
For all items whose price was changed on a given date, compute the percentage change in inventory | |
between the 30day period BEFORE the price change and the 30day period AFTER the change. Group this | |
information by warehouse. | |
Query 23 | |
This Query contains multiple, related iterations: Iteration 1: Calculate the coefficient of variation and mean | |
of every item and warehouse of the given and the consecutive month. Iteration 2: Find items that had a | |
coefficient of variation of 1.3 or larger in the given and the consecutive month | |
Query 24 | |
For a given product, measure the effect of competitor's prices on products' instore and online sales. | |
Compute the crossprice elasticity of demand for a given product. | |
Query 25 | |
Customer segmentation analysis: Customers are separated along the following key shopping dimensions: | |
recency of last visit, frequency of visits and monetary amount. Use the store and online purchase data | |
during a given year to compute. After model of separation is build, report for the analysed customers to | |
which "group" they where assigned. | |
Query 26 | |
Cluster customers into book buddies/club groups based on their in store book purchasing histories. After | |
model of separation is build, report for the analysed customers to which "group" they where assigned. | |
Query 27 | |
For a given product, find "competitor" company names in the product reviews. Display review id, | |
product id, "competitor’s" company name and the related sentence from the online review | |
Query 28 | |
Build text classifier for online review sentiment classification (Positive, Negative, Neutral), using 90% of | |
available reviews for training and the remaining 10% for testing. Display classifier accuracy on testing data | |
and classification result for the 10% testing data: <reviewSK>,<originalRating>,<classificationResult>. | |
Query 29 | |
Perform category affinity analysis for products purchased together online. Note that the order of products | |
viewed does not matter, | |
Query 30 | |
Perform category affinity analysis for products viewed together online. Note that the order of products | |
viewed does not matter, and "viewed together" relates to a click_session of a user with a session timeout of | |
60min. If the duration between two clicks of a user is greater then the session timeout, a new session | |
begins. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment