Cyclistic Marketing Case Study (2023) - Google Capstone Project - Analyze Phase using Big Query

 Cyclistic Marketing Case Study (2023) - Google Capstone Project - Analyze Phase using Big Query


Introduction

This article shows a series of queries on how the analysis summary is derived. It uses the BigQuery. Why BigQuery? It is because:

  • The analysis summary is derived through a series of structured queries that are executed using a tool called BigQuery.
  • BigQuery is a powerful cloud-based data warehouse provided by Google Cloud Platform.
  • The queries used in the analysis process are designed to extract, transform, and analyze large datasets efficiently.
  • By leveraging BigQuery, users can perform complex analytical tasks on massive volumes of data in a scalable and cost-effective manner.
  • The results obtained from these queries form the basis of the analysis summary, providing valuable insights and trends from the data.
  • Utilizing tools like BigQuery enables organizations to make data-driven decisions and uncover hidden patterns within their dataset


Getting and Formulating Data Query that will be used in Analysis

   A. Mean

        A.1. Mean in a Year        

        --Getting the mean for a year

            WITH

            cleaned_data AS

            (SELECT

              *

            FROM

             `gold-pod-375712.Cyclistic_Cleaned_Data.Cyclistic_Clean_Data`)


              --Convert ride_length in seconds to get the average

        , Converted_data as

            (select

             started_at

            , ended_at

            ,extract( year from started_at) as Year

            , ride_length

            , extract (hour from ride_length) as Hours

            , extract (minute from ride_length) as Minutes

            , extract (second from ride_length) as Seconds

            from cleaned_data

                )


                , converted_data1 as

            (Select

            Year

            , hours

            ,minutes

            , seconds

            , (hours*60*60)+ (minutes *60)+ seconds as Converted_Seconds

            from converted_Data)


                , converted_Data2 as

            (select

            "Year"

            , avg(converted_seconds) as Average_seconds

            , avg(converted_seconds)/(60*60)

            , split(cast(avg(converted_seconds)/60 as String), ".")[OFFSET(0)] as minutes

            , split(cast(cast(concat("0." ,split(cast(avg(converted_seconds)/60 as String), ".")

            [OFFSET(1)] ) as float64)*60 as String),".") [OFFSET(0)] as seconds

            from converted_Data1)


                select

            "Year"

            , cast (concat ("00:",minutes, ":", seconds) as time) as Average_Time_in_a_Year

            from converted_Data2


        A.2. Mean in a Monthly Basis

                --Getting the monthly mean

            WITH

          cleaned_data AS

            (SELECT

              *

            FROM

              `gold-pod-375712.Cyclistic_Cleaned_Data.Cyclistic_Clean_Data`)



                --Convert ride_length in seconds to get the average

            , Converted_data as

            (select

            started_at

            , extract( month from started_at) as month0

            ,extract( year from started_at) as Year

            , ride_length

            , extract (hour from ride_length) as Hours

            , extract (minute from ride_length) as Minutes

            , extract (second from ride_length) as Seconds

            from cleaned_data

                )


                , converted_data1 as

            (Select

            month0

            ,case

                when month0 = 1 then 'January'

                when month0 = 2 then 'February'

                when month0 = 3 then 'March'

                when month0 = 4 then 'April'

                when month0 = 5 then 'May'

                when month0 = 6 then 'June'

                when month0 = 7 then 'July'

                when month0 = 8 then 'August'

                when month0 = 9 then 'September'

                when month0 = 10 then 'October'

                when month0 = 11 then 'November'

                when month0 = 12 then 'December'

              else null end as Months

            , hours

            ,minutes

            , seconds

            , (hours*60*60)+ (minutes *60)+ seconds as Converted_Seconds

            from converted_Data)


        , converted_Data2 as

            (select

            month0

            ,months

            ,avg(converted_seconds) over (partition by months) as Average_seconds

            from converted_Data1

            )


            , monthly_mean_data as

            (select

            month0

            ,months

            , average_seconds

            from converted_data2

            group by 1,2,3

            order by 1)


            , monthly_mean_data1 as

            (select

            month0

            , months

            , average_seconds

            , split(cast(average_seconds/60 as String), ".")[OFFSET(0)] as minutes

            , split(cast(cast(concat("0." ,split(cast(average_seconds/60 as String), ".")[OFFSET(1)] ) as

            float64)*60 as String),".") [OFFSET(0)] as seconds

            from monthly_mean_data

            )


            select

        months

            ,cast (concat ("00:",minutes, ":", seconds) as time) as Monthly_mean

            from monthly_mean_data1


        A.2. Mean as to Customer Type

            --Getting the mean as to consumer type


            WITH

              cleaned_data AS

            (SELECT

              *

            FROM

              `gold-pod-375712.Cyclistic_Cleaned_Data.Cyclistic_Clean_Data`)


            --Convert ride_length in seconds to get the average

            , Converted_data as

            (select

             member_casual

            , ride_length

            , extract (hour from ride_length) as Hours

            , extract (minute from ride_length) as Minutes

            , extract (second from ride_length) as Seconds

            from cleaned_data

            )


            , converted_data1 as

            (Select

             member_casual

            , hours

            ,minutes

            , seconds

            , (hours*60*60)+ (minutes *60)+ seconds as Converted_Seconds

            from converted_Data)


            , converted_Data2 as

            (select

            member_casual

            ,avg(converted_seconds) over (partition by member_casual) as Average_seconds

            from converted_Data1

            )


            , customertype_mean_data as

            (select

            member_casual

            ,average_seconds

            from converted_data2

            group by 1,2

            order by 1)


            , customertype_mean_data1 as

            (select

            member_casual

            , average_seconds

            , split(cast(average_seconds/60 as String), ".")[OFFSET(0)] as minutes

            , split(cast(cast(concat("0." ,split(cast(average_seconds/60 as String), ".")[OFFSET(1)] )                         as float64)*60 as String),".") [OFFSET(0)] as seconds

            from customertype_mean_data

            )


            select

            member_casual

            ,cast (concat ("00:",minutes, ":", seconds) as time) as Custome_type_mean

            from customertype_mean_data1


         A.3. Mean as to Bike Type

        WITH

         cleaned_data AS

        (SELECT

          *

        FROM

          `gold-pod-375712.Cyclistic_Cleaned_Data.Cyclistic_Clean_Data`)



            --Convert ride_length in seconds to get the average

        , Converted_data as

        (select

        rideable_type

        , ride_length

        , extract (hour from ride_length) as Hours

        , extract (minute from ride_length) as Minutes

        , extract (second from ride_length) as Seconds

        from cleaned_data

        )


            , converted_data1 as

        (Select

        rideable_type

        , hours

        ,minutes

        , seconds

        , (hours*60*60)+ (minutes *60)+ seconds as Converted_Seconds

        from converted_Data)


            , converted_Data2 as

        (select

        rideable_type

        ,avg(converted_seconds) over (partition by rideable_type) as Average_seconds

        from converted_Data1

        )


            , rideabletype_mean_data as

        (select

        rideable_type

        , average_seconds

        from converted_data2

        group by 1,2

        order by 1)


        , rideabletype_mean_data1 as

        (select

        rideable_type

        , average_seconds

        , split(cast(average_seconds/60 as String), ".")[OFFSET(0)] as minutes

        , split(cast(cast(concat("0." ,split(cast(average_seconds/60 as String), ".")[OFFSET(1)] ) as

        float64)*60 as String),".") [OFFSET(0)] as seconds

        from rideabletype_mean_data

        where cast(split(cast(average_seconds/60 as String), ".")[OFFSET(0)] as int64) <= 60

        )


        , rideabletype_mean_data2 as

        (select

        rideable_type

        ,cast (concat ("00:",minutes, ":", seconds) as time) as Monthly_mean

        from rideabletype_mean_data1

        )


        , rideabletype_mean_data3 as

        (select

        rideable_type

        , average_seconds

        , split(cast(average_seconds/60 as String), ".")[OFFSET(0)] as minutes

        , split(cast(cast(concat("0." ,split(cast(average_seconds/60 as String), ".")[OFFSET(1)] )                     as float64)*60 as String),".") [OFFSET(0)] as seconds

        from rideabletype_mean_data

        where cast(split(cast(average_seconds/60 as String), ".")[OFFSET(0)] as int64) > 60

            )


            select

        rideable_type

        ,cast (concat ("01:",cast(minutes as int64)-60, ":", seconds) as time) as rideable_type_mean

        from rideabletype_mean_data3


            union all


            (select

        rideable_type

        ,cast (concat ("00:",minutes, ":", seconds) as time) as rideable_type_mean

        from rideabletype_mean_data1

        )


    B. Casual Highest Ride Length

             ---Objective: Calculate the highest ride_length casual


            WITH

          cleaned_data AS

        (SELECT * FROM `gold-pod-375712.Cyclistic_Cleaned_Data.Cyclistic_Clean_Data`)


            (Select ride_id

        ,member_casual

        ,ride_length

        from cleaned_data

        where member_casual = "casual"

        order by 3 desc

        Limit 1000

        )


    C. Getting the Numbers of Riders per Days

            --Objective: Calculate the count days of the week


            WITH

              cleaned_data AS

            (SELECT * FROM `gold-pod-375712.Cyclistic_Cleaned_Data.Cyclistic_Clean_Data`)


            select weekdays

                , count (weekdays) as number_of_riders

                , "casual" as member_type

            from cleaned_data

            where member_casual = "casual"

            group by 1


            union all


            select weekdays

            , count (weekdays) as number_of_riders

            , "member" as member_type

            from cleaned_data

            where member_casual = "member"

            group by 1


    D. Getting the Top 1000 places where a Casual Member Rent Bicycle

            --Objective: Getting the list of Station name and its Station ID using a Join Function

            --Getting a list of places where the casual members started to use the bicycles limit to 1000                     places


            with start0 as

            (SELECT

            ride_id

            ,"Start" As Start

            ,Start_lat

            ,start_lng

            , station_name as start_station_name

            , station_id as start_station_id

            ,"End" as End_

            , End_lat

            , End_lng

            , member_casual

             FROM `gold-pod-375712.Cyclistic_Cleaned_Data.Cyclistic_Clean_Data`

            left join `gold-pod-375712.Cyclistic_Cleaned_Data.Cyclistic_Station_Data`

            on start_lat = latittude

            and start_lng = longitude)


            , end0 as

            (select

            ride_id

            ,Start

            ,Start_lat

            ,start_lng

            , start_station_name

            , start_station_id

            , End_

            , End_lat

            , End_lng

            , station_name as End_station_name

            , station_id as End_station_id

            , member_casual

            from start0

            left join `gold-pod-375712.Cyclistic_Cleaned_Data.Cyclistic_Station_Data`

            on end_lat = latittude

            and end_lng = longitude)


            , casual0 as

            (select * from end0

            where member_casual = "casual"

            )


            , start1 as

            (select  

            Start

            , Start_lat

            ,start_lng

            , start_station_name

            , start_station_id

            , Count(concat (Start_lat, "-", start_lng)) as Counters

            from casual0

            Group by 1,2,3,4,5)


            select *

            from start1

            where counters > 1

            order by counters desc

            Limit 1000


Announcement

Exciting news! It is thrilled to announce a fantastic opportunity for you to boost your knowledge! Coursera recently launched a new promo: (50% off first month of Coursera Plus) and the following courses: The Microsoft UX Design Professional, Google Cloud Data Analytics Professional, and Google Cloud Cybersecurity Professional Certificates.


Promo Details:

50% off the first month of Coursera Plus



Microsoft UX Design Professional 

By the end of this Professional Certificate, a learner will be able to describe the core principles of UX, Human-Centered Design, and design thinking frameworks, and identify UX team roles and user research methods.

Translate user insights into actionable design solutions by analyzing user needs, using design ideation techniques, storyboarding, and applying principles of information architecture.

Adapt design ideas into interactive prototypes using wireframing, mockups, prototyping tools, leverage design systems and style guides to ensure consistency; gather valuable feedback to refine solutions through user testing and design critiques.

Apply visual design principles and craft high-fidelity mockups that adhere to platform best practices and ensure accessibility for all; explore the use of AI tools to streamline workflows and curate a compelling UX design portfolio.


Google Cloud Data Analytics 

Hands-On Experience with Google Cloud Tools: Participants gain practical experience with Google Cloud's data analytics tools, including BigQuery, enabling them to handle real-world challenges and work efficiently in a cloud-based data environment. Comprehensive Data Analytics Learning: The Google Cloud Data Analytics Certificate provides a thorough understanding of data analytics in the cloud, from data structuring and storage, through transformation using SQL, to critical data visualization and storytelling techniques.


Google Cloud Cybersecurity

Expert-led Cybersecurity Training: This certificate program, led by Google Cloud experts, offers a comprehensive introduction to security principles in cloud computing, cloud risk management frameworks, and techniques for mitigating attacks, equipping learners with cutting-edge cybersecurity knowledge.

 

 

Post a Comment

0 Comments