https://mode.com/sql-tutorial/understanding-search-functionality/
Understanding Search Functionality | SQL Analytics Training - Mode
In this lesson we'll cover: Before starting, be sure to read the overview to learn a bit about Yammer as a company. The product team is determining priorities for the next development cycle and they are considering improving the site's search functionality
mode.com
https://mode.com/sql-tutorial/understanding-search-functionality-answers/
Understanding Search Functionality: Answers | SQL Analytics Training - Mode
In this lesson we'll cover: Developing hypotheses Framing problems simply and correctly can often save time later on. Thinking about the ultimate purpose of search right off the bat can make it easier to evaluate other parts of them problem. Search, at the
mode.com
Yammer의 프로덕트 팀은 자사의 검색기능을 개선할려고 한다. 하지만 작업 전에 실제로 검색기능을 사용하는 유저의 수가 많은지, 검색기능이 제대로 작동하고 있는지, 어떤 점을 개선해야하는지를 파악해서 적은 자원으로 최대의 효율을 얻고자 한다.
주요 event_name
Table1 : Users
사용자당 하나의 행과 해당 사용자 계정에 대한 설명이 포함되어 있습니다.
user_id | 사용자별 고유 ID |
created_at | 사용자가 최초 생성된 시각 |
state | 사용자 상태(회원가입 완료 or 회원가입 승인 대기) |
activated_at | 사용자가 회원가입 승인 되 시각 |
company_id | 사용자 회사의 ID |
language | 사용자 선택 언어 |
Table2 : Events
이벤트당 1개의 행이 포함됩니다. 이벤트는 사용자가 Yammer에서 수행한 액션입니다. 로그인, 메시징, 검색 등의 이벤트가 있습니다.
user_id | 이벤트를 로깅하는 사용자 ID |
occurred_at | 이벤트가 발생한 시각 |
event_type | 회원가입 절차에서 기록되는 signup_flow와 회원가입 후 일반적인 제품 사용을 나타내는 engagement 두가지가 있습니다. |
event_name | 사용자가 수행한 특정 액션입니다.사용 가능한 값은 다음과 같습니다. create_user: 사용자가 등록 프로세스 중에 Yammer 데이터베이스에 추가됩니다.enter_email:사용자는 이메일 주소를 입력하여 등록 프로세스를 시작합니다.enter_info: 사용자는 등록 프로세스 중에 이름과 개인정보를 입력합니다.complete_signup:사용자는 등록/인증 프로세스 전체를 완료한다.home_page:사용자가 홈 페이지를 로드합니다.like_message:사용자가 다른 사용자의 메시지를 좋아합니다.로그인:사용자가 Yammer에 로그인하다search_autocomplete:사용자가 자동 완성 목록에서 검색 결과를 선택합니다.search_run: 사용자가 검색 쿼리를 실행하여 검색 결과 페이지로 이동합니다.search_click_result_X: 사용자가 결과 페이지에서 검색 결과 X 를 클릭합니다.X 는 1 ~10 의 숫자입니다. send_message:사용자가 메시지를 투고하다view_inbox: 사용자가 받은 편지함의 메시지를 봅니다. |
location | 이벤트가 로그에 기록된 국가(IP 주소를 통해 수집). |
device | 이벤트 로그에 사용되는 디바이스 유형. |
세션은 두 이벤트 사이에 10분이상의 휴식 없이 사용자가 기록한 이벤트로 정의됩니다. 따라서 사용자가 이벤트를 기록하지 않고 10분을 경과하면 세션이 종료되고 다음 이벤트는 새로운 세션으로 간주됩니다.
1.
Q : autocomplete, full search를 사용하는 세션은 얼마나 되나?
A : search_autocomplete은 전체세션의 약 25%에서 사용되는 반면 search_run은 전체 세션에서 8% 정도만 사용됩니다
전체 유저의 25%는 Yammer에서 검색기능을 사용할 일이 있습니다. 즉 검색을 필요로 하는 유저가 존재한다는 뜻입니다.
WITH bounds AS (
SELECT user_id,
event_type,
event_name,
occurred_at,
occurred_at - LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER () AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id, occurred_at
),
final AS (
SELECT bounds.*,
CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
WHEN last_event IS NULL THEN id
ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM bounds
WHERE last_event >= INTERVAL '10 MINUTE' OR next_event >= INTERVAL '10 MINUTE' OR last_event IS NULL OR next_event IS NULL
),
session AS (
SELECT user_id,
session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM final
GROUP BY user_id, session
),
x AS (
SELECT e.*,
session.session,
session.session_start
FROM tutorial.yammer_events e
LEFT JOIN session ON e.user_id = session.user_id
AND e.occurred_at >= session.session_start
AND e.occurred_at <= session.session_end
WHERE e.event_type = 'engagement'
),
z AS (
SELECT x.session_start,
x.session,
x.user_id,
COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN x.event_name LIKE 'search_click_%' THEN x.user_id ELSE NULL END) AS clicks
FROM x
GROUP BY x.session_start, x.session, x.user_id
)
SELECT DATE_TRUNC('week', z.session_start) AS week,
COUNT(*) AS sessions,
COUNT(CASE WHEN z.autocompletes > 0 THEN z.session ELSE NULL END) AS with_autocompletes,
COUNT(CASE WHEN z.runs > 0 THEN z.session ELSE NULL END) AS with_runs
FROM z
GROUP BY week
ORDER BY week
2-1.
Q : 자동 완성 기능을 사용한 사람들은 한 세션 내에서 몇 번 정도 해당 기능을 사용하나?
A : autocomplete 기능을 사용한 세션 중 autocomplete 기능을 1번만 사용한 세션은 58.6%
WITH bounds AS (
SELECT user_id,
event_type,
event_name,
occurred_at,
occurred_at - LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER() AS id
FROM tutorial.yammer_events
WHERE event_type = 'engagement'
ORDER BY user_id, occurred_at
),
final AS (
SELECT *,
CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
WHEN last_event IS NULL THEN id
ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM bounds
WHERE last_event >= INTERVAL '10 MINUTE' OR next_event >= INTERVAL '10 MINUTE' OR last_event IS NULL OR next_event IS NULL
),
session AS (
SELECT user_id,
session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM final
GROUP BY user_id, session
),
x AS (
SELECT e.*,
session,
session_start
FROM tutorial.yammer_events e
LEFT JOIN session s ON e.user_id = s.user_id
AND e.occurred_at >= s.session_start
AND e.occurred_at <= s.session_end
WHERE e.event_type = 'engagement'
),
z AS (
SELECT session_start,
session,
user_id,
COUNT(CASE WHEN event_name = 'search_autocomplete' THEN user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN event_name = 'search_run' THEN user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN event_name LIKE 'search_click_%' THEN user_id ELSE NULL END) AS clicks
FROM x
GROUP BY session_start, session, user_id
)
SELECT autocompletes,
COUNT(*) AS sessions
FROM z
WHERE autocompletes > 0
GROUP BY autocompletes
ORDER BY autocompletes
2-2.
Q : full search를 사용한 사람들은 한 세션 내에서 몇 번 정도 해당 기능을 사용하나?
A : full search를 1번만 사용한 사람들은 적다. 대부분의 사람들이 2번 이상 사용
WITH bounds AS (
SELECT user_id,
event_type,
event_name,
occurred_at,
occurred_at - LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER() AS id
FROM tutorial.yammer_events
WHERE event_type = 'engagement'
ORDER BY user_id, occurred_at
),
final AS (
SELECT *,
CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
WHEN last_event IS NULL THEN id
ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM bounds
WHERE last_event >= INTERVAL '10 MINUTE' OR next_event >= INTERVAL '10 MINUTE' OR last_event IS NULL OR next_event IS NULL
),
session AS (
SELECT user_id,
session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM final
GROUP BY user_id, session
),
x AS (
SELECT e.*,
session,
session_start
FROM tutorial.yammer_events e
LEFT JOIN session s ON e.user_id = s.user_id
AND e.occurred_at >= s.session_start
AND e.occurred_at <= s.session_end
WHERE e.event_type = 'engagement'
),
z AS (
SELECT session_start,
session,
user_id,
COUNT(CASE WHEN event_name = 'search_autocomplete' THEN user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN event_name = 'search_run' THEN user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN event_name LIKE 'search_click_%' THEN user_id ELSE NULL END) AS clicks
FROM x
GROUP BY session_start, session, user_id
)
SELECT runs,
COUNT(*) AS sessions
FROM z
WHERE runs > 0
GROUP BY runs
ORDER BY runs
3-1.
Q : Full search를 한 사용자들이 검색 결과를 클락하는가?
A : Full search를 사용한 세션 중에서 검색결과를 클릭하지 않은 세션이 절반이 넘음
WITH bounds AS (
SELECT user_id,
event_type,
event_name,
occurred_at,
occurred_at - LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER() AS id
FROM tutorial.yammer_events
WHERE event_type = 'engagement'
ORDER BY user_id, occurred_at
),
final AS (
SELECT *,
CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
WHEN last_event IS NULL THEN id
ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM bounds
WHERE last_event >= INTERVAL '10 MINUTE' OR next_event >= INTERVAL '10 MINUTE' OR last_event IS NULL OR next_event IS NULL
),
session AS (
SELECT user_id,
session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM final
GROUP BY user_id, session
),
x AS (
SELECT e.*,
session,
session_start
FROM tutorial.yammer_events e
LEFT JOIN session s ON e.user_id = s.user_id
AND e.occurred_at >= s.session_start
AND e.occurred_at <= s.session_end
WHERE e.event_type = 'engagement'
),
z AS (
SELECT session_start,
session,
user_id,
COUNT(CASE WHEN event_name = 'search_autocomplete' THEN user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN event_name = 'search_run' THEN user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN event_name LIKE 'search_click_%' THEN user_id ELSE NULL END) AS clicks
FROM x
GROUP BY session_start, session, user_id
)
SELECT clicks,
sessions,
sessions / SUM(sessions) over() AS sessions_pct
FROM (
SELECT clicks,
COUNT(*) AS sessions
FROM z
WHERE runs > 0
GROUP BY clicks
) sub
3-2.
Q : 한 세션내에서 Full Search를 여러번 했다면 검색결과 클릭도 많이 일어날까?
A : 검색을 여러번 했을 때 클릭도 많이 일어나는 걸로 보임
WITH bounds AS (
SELECT user_id,
event_type,
event_name,
occurred_at,
occurred_at - LAG(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER() AS id
FROM tutorial.yammer_events
WHERE event_type = 'engagement'
ORDER BY user_id, occurred_at
),
final AS (
SELECT *,
CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
WHEN last_event IS NULL THEN id
ELSE LAG(id, 1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM bounds
WHERE last_event >= INTERVAL '10 MINUTE' OR next_event >= INTERVAL '10 MINUTE' OR last_event IS NULL OR next_event IS NULL
),
session AS (
SELECT user_id,
session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM final
GROUP BY user_id, session
),
x AS (
SELECT e.*,
session,
session_start
FROM tutorial.yammer_events e
LEFT JOIN session s ON e.user_id = s.user_id
AND e.occurred_at >= s.session_start
AND e.occurred_at <= s.session_end
WHERE e.event_type = 'engagement'
),
z AS (
SELECT session_start,
session,
user_id,
COUNT(CASE WHEN event_name = 'search_autocomplete' THEN user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN event_name = 'search_run' THEN user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN event_name LIKE 'search_click_%' THEN user_id ELSE NULL END) AS clicks
FROM x
GROUP BY session_start, session, user_id
)
SELECT runs,
SUM(clicks) AS clicks,
COUNT(*) AS sessions,
SUM(clicks)/COUNT(*) AS avg_clicks
FROM z
WHERE runs > 0
GROUP BY runs
4.
Q : Full search 이후에 나오는 컨텐츠 중에서 몇번째 순서에 있는 컨텐츠를 클릭했는가?
A : Full search 사용후 검색결과 페이지에서 클릭하는 컨텐츠의 순서는 고르게 분포되어 있음. 위의 3번 분석과 연계해서 생각해보면 Full search 이후에 클릭을 1번만 사람은 2.45%임. Full search를 사용한 대부분의 유저는 여러개의 컨텐츠를 클릭해봄. 검색결과의 순서를 정하는 알고리즘이 문제가 아니라 애초에 유저가 검색결과에 만족을 못하고 여러개의 컨텐츠를 클릭해보는 상황일 수도 있음
SELECT TRIM('search_click_result_' FROM event_name)::INT AS search_result,
COUNT(*) AS clicks
FROM tutorial.yammer_events
WHERE event_name LIKE 'search_click_%'
GROUP BY search_result
ORDER BY search_result
5-1.
Q : Full search를 최초 사용 이후에 유저들이 해당 기능을 한달 이내에 다시 사용하는가?
A : 한달 이내에 1회 재사용한 유저는 1177명, 2회는 428명
WITH bounds AS (
SELECT user_id,
event_type,
occurred_at,
occurred_at - LAG(occurred_at, 1) OVER(PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at, 1) OVER(PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER() AS id
FROM tutorial.yammer_events
WHERE event_type = 'engagement'
),
final AS (
SELECT *,
CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
WHEN last_event IS NULL THEN id
ELSE LAG(id, 1) OVER(PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM bounds
WHERE last_event IS NULL or next_event IS NULL or last_event >= INTERVAL '10 MINUTE' or next_event >= INTERVAL '10 MINUTE'
),
session AS (
SELECT user_id,
session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM final
GROUP BY user_id, session
),
first AS (
SELECT user_id,
MIN(occurred_at) AS first_search
FROM tutorial.yammer_events
WHERE event_name = 'search_run'
GROUP BY user_id
),
x AS (
SELECT e.*,
first.first_search,
session.session,
session.session_start
FROM tutorial.yammer_events e
JOIN first ON e.user_id = first.user_id
AND first.first_search <= '2014-08-01'
LEFT JOIN session ON session.user_id = e.user_id
AND session.session_start <= e.occurred_at
AND session.session_end >= e.occurred_at
AND session.session_start <= first.first_search + INTERVAL '30 DAY'
WHERE e.event_type = 'engagement'
),
z1 AS (
SELECT session_start,
session,
user_id,
first_search,
COUNT(CASE WHEN event_name = 'search_run' THEN user_id ELSE NULL END) AS runs
FROM x
GROUP BY session_start, session, user_id, first_search
),
z2 AS (
SELECT user_id,
COUNT(*) AS searches
FROM z1
WHERE z1.runs > 0
GROUP BY user_id
)
SELECT searches,
COUNT(*) AS users
FROM z2
GROUP BY searches
ORDER BY searches
5_2.
Q : auto_complete를 최초 사용 이후에 유저들이 해당 기능을 한달 이내에 다시 사용하는가?
A : auto_complete를 최초 사용 이후에 한달 이내에 1회 사용한 유저는 1239명, 2회는 1003명
Insight : 상대적으로 full search보다 autocomplete를 재사용한 유저들이 많다.
WITH bounds AS (
SELECT user_id,
event_type,
occurred_at,
occurred_at - LAG(occurred_at, 1) OVER(PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at, 1) OVER(PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER() AS id
FROM tutorial.yammer_events
WHERE event_type = 'engagement'
),
final AS (
SELECT *,
CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
WHEN last_event IS NULL THEN id
ELSE LAG(id, 1) OVER(PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM bounds
WHERE last_event IS NULL or next_event IS NULL or last_event >= INTERVAL '10 MINUTE' or next_event >= INTERVAL '10 MINUTE'
),
session AS (
SELECT user_id,
session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM final
GROUP BY user_id, session
),
first AS (
SELECT user_id,
MIN(occurred_at) AS first_search
FROM tutorial.yammer_events
WHERE event_name = 'search_autocomplete'
GROUP BY user_id
),
x AS (
SELECT e.*,
first.first_search,
session.session,
session.session_start
FROM tutorial.yammer_events e
JOIN first ON e.user_id = first.user_id
AND first.first_search <= '2014-08-01'
LEFT JOIN session ON session.user_id = e.user_id
AND session.session_start <= e.occurred_at
AND session.session_end >= e.occurred_at
AND session.session_start <= first.first_search + INTERVAL '30 DAY'
WHERE e.event_type = 'engagement'
),
z1 AS (
SELECT session_start,
session,
user_id,
first_search,
COUNT(CASE WHEN event_name = 'search_autocomplete' THEN user_id ELSE NULL END) AS runs
FROM x
GROUP BY session_start, session, user_id, first_search
),
z2 AS (
SELECT user_id,
COUNT(*) AS searches
FROM z1
WHERE z1.runs > 0
GROUP BY user_id
)
SELECT searches,
COUNT(*) AS users
FROM z2
GROUP BY searches
ORDER BY searches
Investigating a Drop in User Engagement(프로젝트) (0) | 2022.08.17 |
---|
댓글 영역