Introduction
This article shows a series of queries on how the analysis summary is derived. It uses the BigQuery. Why BigQuery? It is because:
- The analysis summary is derived through a series of structured queries that are executed using a tool called BigQuery.
- BigQuery is a powerful cloud-based data warehouse provided by Google Cloud Platform.
- The queries used in the analysis process are designed to extract, transform, and analyze large datasets efficiently.
- By leveraging BigQuery, users can perform complex analytical tasks on massive volumes of data in a scalable and cost-effective manner.
- The results obtained from these queries form the basis of the analysis summary, providing valuable insights and trends from the data.
- Utilizing tools like BigQuery enables organizations to make data-driven decisions and uncover hidden patterns within their dataset
Getting and Formulating Data Query that will be used in Analysis
A. Mean
A.1. Mean in a Year
--Getting the mean for a year
WITH
cleaned_data AS
(SELECT
*
FROM
`gold-pod-375712.Cyclistic_Cleaned_Data.Cyclistic_Clean_Data`)
--Convert ride_length in seconds to
get the average
, Converted_data as
(select
started_at
, ended_at
,extract( year from started_at) as Year
, ride_length
, extract (hour from ride_length) as Hours
, extract (minute from ride_length) as Minutes
, extract (second from ride_length) as Seconds
from cleaned_data
)
, converted_data1 as
(Select
Year
, hours
,minutes
, seconds
, (hours*60*60)+ (minutes *60)+ seconds as Converted_Seconds
from converted_Data)
, converted_Data2 as
(select
"Year"
, avg(converted_seconds) as Average_seconds
, avg(converted_seconds)/(60*60)
, split(cast(avg(converted_seconds)/60 as String), ".")[OFFSET(0)] as minutes
, split(cast(cast(concat("0." ,split(cast(avg(converted_seconds)/60 as String), ".")
[OFFSET(1)] ) as float64)*60 as String),".") [OFFSET(0)] as seconds
from converted_Data1)
select
"Year"
, cast (concat ("00:",minutes, ":", seconds) as time) as Average_Time_in_a_Year
from converted_Data2
A.2. Mean in a Monthly Basis
WITH
cleaned_data AS
(SELECT
*
FROM
`gold-pod-375712.Cyclistic_Cleaned_Data.Cyclistic_Clean_Data`)
--Convert ride_length in seconds to
get the average
, Converted_data as
(select
started_at
, extract( month from started_at) as month0
,extract( year from started_at) as Year
, ride_length
, extract (hour from ride_length) as Hours
, extract (minute from ride_length) as Minutes
, extract (second from ride_length) as Seconds
from cleaned_data
)
, converted_data1 as
(Select
month0
,case
when month0 = 1 then 'January'
when month0 = 2 then 'February'
when month0 = 3 then 'March'
when month0 = 4 then 'April'
when month0 = 5 then 'May'
when month0 = 6 then 'June'
when month0 = 7 then 'July'
when month0 = 8 then 'August'
when month0 = 9 then 'September'
when month0 = 10 then 'October'
when month0 = 11 then 'November'
when month0 = 12 then 'December'
else null end as Months
, hours
,minutes
, seconds
, (hours*60*60)+ (minutes *60)+ seconds as Converted_Seconds
from converted_Data)
, converted_Data2 as
(select
month0
,months
,avg(converted_seconds) over (partition by months) as Average_seconds
from converted_Data1
)
, monthly_mean_data as
(select
month0
,months
, average_seconds
from converted_data2
group by 1,2,3
order by 1)
, monthly_mean_data1 as
(select
month0
, months
, average_seconds
, split(cast(average_seconds/60 as String), ".")[OFFSET(0)] as minutes
, split(cast(cast(concat("0." ,split(cast(average_seconds/60 as String), ".")[OFFSET(1)] ) as
float64)*60 as String),".") [OFFSET(0)] as seconds
from monthly_mean_data
)
select
months
,cast (concat ("00:",minutes, ":", seconds) as time) as Monthly_mean
from monthly_mean_data1
A.2. Mean as to Customer Type
--Getting the
mean as to consumer type
WITH
cleaned_data AS
(SELECT
*
FROM
`gold-pod-375712.Cyclistic_Cleaned_Data.Cyclistic_Clean_Data`)
--Convert ride_length in seconds to
get the average
, Converted_data as
(select
member_casual
, ride_length
, extract (hour from ride_length) as Hours
, extract (minute from ride_length) as Minutes
, extract (second from ride_length) as Seconds
from cleaned_data
)
, converted_data1 as
(Select
member_casual
, hours
,minutes
, seconds
, (hours*60*60)+ (minutes *60)+ seconds as Converted_Seconds
from converted_Data)
, converted_Data2 as
(select
member_casual
,avg(converted_seconds) over (partition by member_casual) as Average_seconds
from converted_Data1
)
, customertype_mean_data as
(select
member_casual
,average_seconds
from converted_data2
group by 1,2
order by 1)
, customertype_mean_data1 as
(select
member_casual
, average_seconds
, split(cast(average_seconds/60 as String), ".")[OFFSET(0)] as minutes
, split(cast(cast(concat("0." ,split(cast(average_seconds/60 as String), ".")[OFFSET(1)] ) as float64)*60 as String),".") [OFFSET(0)] as seconds
from customertype_mean_data
)
select
member_casual
,cast (concat ("00:",minutes, ":", seconds) as time) as Custome_type_mean
from customertype_mean_data1
A.3. Mean as to Bike Type
WITH
cleaned_data AS
(SELECT
*
FROM
`gold-pod-375712.Cyclistic_Cleaned_Data.Cyclistic_Clean_Data`)
--Convert ride_length in seconds to
get the average
, Converted_data as
(select
rideable_type
, ride_length
, extract (hour from ride_length) as Hours
, extract (minute from ride_length) as Minutes
, extract (second from ride_length) as Seconds
from cleaned_data
)
, converted_data1 as
(Select
rideable_type
, hours
,minutes
, seconds
, (hours*60*60)+ (minutes *60)+ seconds as Converted_Seconds
from converted_Data)
, converted_Data2 as
(select
rideable_type
,avg(converted_seconds) over (partition by rideable_type) as Average_seconds
from converted_Data1
)
, rideabletype_mean_data as
(select
rideable_type
, average_seconds
from converted_data2
group by 1,2
order by 1)
, rideabletype_mean_data1 as
(select
rideable_type
, average_seconds
, split(cast(average_seconds/60 as String), ".")[OFFSET(0)] as minutes
, split(cast(cast(concat("0." ,split(cast(average_seconds/60 as String), ".")[OFFSET(1)] ) as
float64)*60 as String),".") [OFFSET(0)] as seconds
from rideabletype_mean_data
where cast(split(cast(average_seconds/60 as String), ".")[OFFSET(0)] as int64) <= 60
)
, rideabletype_mean_data2 as
(select
rideable_type
,cast (concat ("00:",minutes, ":", seconds) as time) as Monthly_mean
from rideabletype_mean_data1
)
, rideabletype_mean_data3 as
(select
rideable_type
, average_seconds
, split(cast(average_seconds/60 as String), ".")[OFFSET(0)] as minutes
, split(cast(cast(concat("0."
,split(cast(average_seconds/60 as String), ".")[OFFSET(1)] ) as float64)*60 as String),".") [OFFSET(0)] as seconds
from rideabletype_mean_data
where cast(split(cast(average_seconds/60 as String), ".")[OFFSET(0)] as int64) > 60
)
select
rideable_type
,cast (concat ("01:",cast(minutes as int64)-60, ":", seconds) as time) as rideable_type_mean
from rideabletype_mean_data3
union all
(select
rideable_type
,cast (concat ("00:",minutes, ":", seconds) as time) as rideable_type_mean
from rideabletype_mean_data1
)
B. Casual Highest Ride Length
---Objective: Calculate the highest ride_length casual
WITH
cleaned_data AS
(SELECT * FROM `gold-pod-375712.Cyclistic_Cleaned_Data.Cyclistic_Clean_Data`)
(Select ride_id
,member_casual
,ride_length
from cleaned_data
where member_casual = "casual"
order by 3 desc
Limit 1000
)
C. Getting the Numbers of Riders per Days
--Objective: Calculate the count days of the week
WITH
cleaned_data AS
(SELECT * FROM `gold-pod-375712.Cyclistic_Cleaned_Data.Cyclistic_Clean_Data`)
select weekdays
, count (weekdays) as number_of_riders
, "casual" as member_type
from cleaned_data
where member_casual = "casual"
group by 1
union all
select weekdays
, count (weekdays) as number_of_riders
, "member" as member_type
from cleaned_data
where member_casual = "member"
group by 1
D. Getting the Top 1000 places where a Casual Member Rent Bicycle
--Objective:
Getting the list of Station name and its Station ID using a Join Function
--Getting a
list of places where the casual members started to use the bicycles limit to
1000 places
with start0 as
(SELECT
ride_id
,"Start" As Start
,Start_lat
,start_lng
, station_name as start_station_name
, station_id as start_station_id
,"End" as End_
, End_lat
, End_lng
, member_casual
FROM `gold-pod-375712.Cyclistic_Cleaned_Data.Cyclistic_Clean_Data`
left join `gold-pod-375712.Cyclistic_Cleaned_Data.Cyclistic_Station_Data`
on start_lat = latittude
and start_lng = longitude)
, end0 as
(select
ride_id
,Start
,Start_lat
,start_lng
, start_station_name
, start_station_id
, End_
, End_lat
, End_lng
, station_name as End_station_name
, station_id as End_station_id
, member_casual
from start0
left join `gold-pod-375712.Cyclistic_Cleaned_Data.Cyclistic_Station_Data`
on end_lat = latittude
and end_lng = longitude)
, casual0 as
(select * from end0
where member_casual =
"casual"
)
, start1 as
(select
Start
, Start_lat
,start_lng
, start_station_name
, start_station_id
, Count(concat (Start_lat, "-",
start_lng)) as Counters
from casual0
Group by 1,2,3,4,5)
select *
from start1
where counters > 1
order by counters desc
Limit 1000
Announcement
Exciting news! It is thrilled to announce a fantastic opportunity for you to boost your knowledge! Coursera recently launched a new promo: (50% off first month of Coursera Plus) and the following courses: The Microsoft UX Design Professional, Google Cloud Data Analytics Professional, and Google Cloud Cybersecurity Professional Certificates.
Promo Details:
50% off the first month of Coursera Plus
Microsoft UX Design Professional
By the end of this Professional Certificate, a learner will be able to describe the core principles of UX, Human-Centered Design, and design thinking frameworks, and identify UX team roles and user research methods.
Translate user insights into actionable design solutions by analyzing user needs, using design ideation techniques, storyboarding, and applying principles of information architecture.
Adapt design ideas into interactive prototypes using wireframing, mockups, prototyping tools, leverage design systems and style guides to ensure consistency; gather valuable feedback to refine solutions through user testing and design critiques.
Apply visual design principles and craft high-fidelity mockups that adhere to platform best practices and ensure accessibility for all; explore the use of AI tools to streamline workflows and curate a compelling UX design portfolio.
Hands-On Experience with Google Cloud Tools: Participants gain practical experience with Google Cloud's data analytics tools, including BigQuery, enabling them to handle real-world challenges and work efficiently in a cloud-based data environment. Comprehensive Data Analytics Learning: The Google Cloud Data Analytics Certificate provides a thorough understanding of data analytics in the cloud, from data structuring and storage, through transformation using SQL, to critical data visualization and storytelling techniques.
Expert-led Cybersecurity Training: This certificate program, led by Google Cloud experts, offers a comprehensive introduction to security principles in cloud computing, cloud risk management frameworks, and techniques for mitigating attacks, equipping learners with cutting-edge cybersecurity knowledge.
0 Comments