Cyclistic Marketing Case Study (2023) - Google Capstone Project - Comprehensive

Cyclistic Marketing Case Study - Google Capstone Project - Comprehensive

Note: This blog post is my thorough output of the Cyclistic Marketing Case Study as a part of Google Capstone Project. I already did this case study before (check here), and that time, I used Google Sheets only. Now, I tried to incorporate it by using SQL and any other data analytic tools to make it more comprehensive.

Moreover, this post may contain affiliate links that at no additional cost to you, I may earn a small commission. Thank you for your support.

Business Tasks

Cyclistic Company defined casual riders are those customers who purchase either single-ride passes or full-day ride passes, while they defined Cyclistic members are those customers who purchased annual memberships. Concluded by financial analysts, the company has identified that its Cyclistic members contribute more significantly to the company's profits than casual riders. Maam Lily Moreno, a marketing director and also a stakeholder in this project, has decided to shift the focus of the marketing strategy away from acquiring new customers. Instead, the goal is to convert casual riders into Cyclistic members. As part of the company’s marketing team, she seeks to understand the differences between casual riders and Cyclistic members and how these differences can impact the design of the new marketing campaign.

Enroll Now (Financial Aid Available): Google Data Analytics Professional Certificate

Description of Data Sources Used

The company provides a historical trip dataset that is every month, starting from December 2022 to November 2023. You can verify the Cyclistic trip data here. The data is organized by Motivate International Inc. which has a data license agreement. Listed below are the 12 .csv files that are used to analyze the data.

  1. 202212-divvy-tripdata
  2. 202301-divvy-tripdata
  3. 202302-divvy-tripdata
  4. 202303-divvy-tripdata
  5. 202304-divvy-tripdata
  6. 202305-divvy-tripdata
  7. 202306-divvy-tripdata
  8. 202307-divvy-tripdata
  9. 202308-divvy-tripdata
  10. 202309-divvy-tripdata
  11. 202310-divvy-tripdata
  12. 202311-divvy-tripdata

Every file contains trip data and is structured in a long form. Part of the data license agreement is to prohibit the use of riders’ personally identifiable information for data privacy so, it cannot access the credit card numbers of every rider. The following are the columns that are indicated in the 12 CSV files.

  • ride_id
  • rideable_type
  • started_at
  • ended_at
  • start_station_name
  • start_station_id
  • end_station_name
  • end_station_id
  • start_lat
  • start_lng
  • end_lat
  • end_lng
  • member_casual

The data sources maintain ROCCC.

  • Reliable- The datasets contain accurate and unbiased information on Cyclistic’s transactions every day.
  • Original- The datasets is public and the source is based on the transactions in Chicago and has a license to use the data.
  • Comprehensive- There is a column that differentiates between casual riders and members riders.
  • Current- The latest public dataset available is December 2023 and the scope of this analysis one year.
  • Cited- The data is filed by Motivate International Inc. which has a data license agreement. Allow us to use the datasets in public.

Even though data about the station name and ID are not fully complete and mostly blanks, it does not affect getting information about the difference between the two. Hence, the remaining data is enough to achieve our business task and have a great contribution to making a new plan for the marketing campaign. 

Documentation of Data Cleaning and Data Manipulation

Microsoft Excel: Initial Data Cleaning and Manipulation

As part of the Prepare phase, I downloaded all the 12 zip files and unzipped them. I opened it and converted it from CSV to XLS files. I created folders for the CSV and XLS files. For each XLS file, I did the following:

  1. Created a column named ride_length, in which it subtracts the ended_at and started_at (=d2-c2) and formatted as hh:mm:ss.
  2. Created a column named day_of_week, in which determines the days of the week assigned by their numbers, and formatted as Numbers with no decimals.
  3. Then converted all the files into CSV files for uploading in Big Query.

Station name will be used in joins in Big Query.

Since my account in BigQuery was in trial mode, file sizes were limited to 100 MB, so some of the updated CSV files could not be uploaded. To pursue it, I modified again the CSV files:

  1. Opened new Excel file that is used for the station name, station id, and its longitude and latitude,
  2. Copied the start_station_name, start_station_id, end_station_name, end_station_id, start_lat, start_lng, end_lat, and end_lng columns from each CSV files.
  3. Pasted it in the Excel Files and used Remove Duplicates to delete all the duplications in the Excel and remains the unique data
  4. Saved it into CSV files and named as Station_name
  5. Deleted the start_station_name, start_station_id, end_station_name, and end_station_id column in each CSV file that I’d copied to reduce the file size.
  6. In that case, it would have a two sets of CSV files one for the monthly data and the other for the station names.
  7. All the CSV files were uploaded to BigQuery.

During the uploading of BigQuery, Some of the CSV files regarding to the datatrip did not continue because of the errors. I found out that the cause of the error is that time indicated in ended_at was earlier than the started_at. I did the following:

  1. Modified the ride_length in each CSV files by using an absolute to remove the negative symbol i.e. =abs(d2-c2)
  2. Saved it and uploaded it again.

BigQuery: Further Data Cleaning or Manipulating using Structured Query Language

Using the BigQuery SQL, I unite all the monthly data into one database using the UNION command. I also named the day_of_week column to the actual days of the week. See the query below:

--Objective: To clean or manipulate the data
--To unite all monthly data

With raw_data as

(SELECT * FROM `gold-pod-375712.Cyclistic_Marketing_Study.202212-divvy-tripdata`

union all

SELECT * FROM `gold-pod-375712.Cyclistic_Marketing_Study.202301-divvy-tripdata`

union all

SELECT * FROM `gold-pod-375712.Cyclistic_Marketing_Study.202302-divvy-tripdata`

union all

SELECT * FROM `gold-pod-375712.Cyclistic_Marketing_Study.202303-divvy-tripdata`

union all

SELECT * FROM `gold-pod-375712.Cyclistic_Marketing_Study.202304-divvy-tripdata`

union all

SELECT * FROM `gold-pod-375712.Cyclistic_Marketing_Study.202305-divvy-tripdata`

union all

SELECT * FROM `gold-pod-375712.Cyclistic_Marketing_Study.202306-divvy-tripdata`

union all

SELECT * FROM `gold-pod-375712.Cyclistic_Marketing_Study.202307-divvy-tripdata`

union all

SELECT * FROM `gold-pod-375712.Cyclistic_Marketing_Study.202308-divvy-tripdata`

union all

SELECT * FROM `gold-pod-375712.Cyclistic_Marketing_Study.202309-divvy-tripdata`

union all

SELECT * FROM `gold-pod-375712.Cyclistic_Marketing_Study.202310-divvy-tripdata`

union all

SELECT * FROM `gold-pod-375712.Cyclistic_Marketing_Study.202311-divvy-tripdata`

)


--Naming day_of_week into actual days

(select *

, case

  when day_of_week = 1 then 'Sunday'

  when day_of_week = 2 then 'Monday'

  when day_of_week = 3 then 'Tuesday'

  when day_of_week = 4 then 'Wednesday'

  when day_of_week = 5 then 'Thursday'

  when day_of_week = 6 then 'Friday'

  when day_of_week = 7 then 'Saturday'

  else null end as Weekdays

from raw_data

)

I also unite all the station names. The processed station names were also be used for analysis. See the query below:

--To unite all the station name

with raw_data_station_name as (

SELECT * FROM `gold-pod-375712.Cyclistic_Marketing_Study.Stations_name-p1`

union all

SELECT * FROM `gold-pod-375712.Cyclistic_Marketing_Study.Stations_name-p2`

union all

SELECT * FROM `gold-pod-375712.Cyclistic_Marketing_Study.Stations_name-p3`

)


--to get the start station name

,start0 as

(select

"Start" as Point

, start_lat

, start_lng

, start_station_name

, start_station_id

from raw_data_station_name)


, start1 as

(select * from start0

group by 1,2,3,4,5

)


, end0 as

(select

"End" as point

, end_lat

, end_lng

, end_station_name

, end_station_id

from raw_data_station_name)


, end1 as

(select * from end0

group by 1,2,3,4,5

)


--Uniting all the stations (both starting points and ending points)


,uniting as

(select * from end1

union all

select * from start1

order by 1 desc

)


select point

, end_lat as latittude

, end_lng as longitude

, end_station_id as station_id

, end_station_name as station_name

from uniting

group by 1,2,3,4,5


All the queries indicated above were saved as a view:

  • Cyclistic_Clean_Data
  • Cyclistic_Station_Data

Summary of Analysis 

In our recent analysis using BigQuery SQL, we leveraged the platform's extensive data processing capabilities to uncover critical insights from large datasets. By employing SQL queries (you can check here my SQL queries), we were able to efficiently filter, aggregate, and visualize data trends, enabling us to identify key patterns and correlations. This powerful approach not only streamlined our data analysis process but also provided a foundation for making data-driven decisions and strategic recommendations. Here are the results in a list form:
  • Mean in a Year- 00:15:21
  • Mean as to per Month
    • January- 00:10:57
    • February- 00:11:51
    • March- 00:11:35
    • April- 00:14:52
    • May- 00:16:28
    • June- 00:16:49
    • July- 00:17:41
    • August- 00:17:00
    • September- 00:16:00
    • October- 00:14:02
    • November- 00:12:20
    • December- 00:11:15
  • Mean as to Customer Type
    • Casual - 00:21:10
    • Member - 00:12:03
  • Mean as to Rideable Type
    • Classic Bike: 00:17:09
    • Docked Bike: 01:06:30
    • Electric Bike: 00:12:18
  • Maximum Casual Ride Length- 24:00:00 (1 day)
  • Number of Rider per Type per Day
    • Casual
      • Sunday- 333,585
      • Monday- 234,719
      • Tuesday- 246,821
      • Wednesday- 247,864
      • Thursday- 271,151
      • Friday- 309,570
      • Saturday- 408,691
    • Member
      • Sunday- 402,607
      • Monday- 491,868
      • Tuesday- 575,396
      • Wednesday- 578,840
      • Thursday- 588,144
      • Friday- 521,939
      • Saturday- 466,415
  • Top 10 Starting Point Places where Casual Riders Go
    • Streeter Dr & Grand Avenu
    • DuSable Lake Shore Dr & Monroe St.
    • DuSable Lake Shore Dr & North Blvd
    • Michigan Ave & Oak St.
    • Theater on the Lake
    • Millenium Park
    • Kingsbury St, & Kinzie 
    • Halsted St.& Fulton St.
    • N. Carpenter St. & WLake St.
    • N. Green St. & W Lake St.
  • Top 10 Ending Point Places where Casual Riders Go
    • Streeter Dr & Grand Avenu
    • DuSable Lake Shore Dr & Monroe St.
    • DuSable Lake Shore Dr & North Blvd
    • Michigan Ave & Oak St.
    • Theater on the Lake
    • Millenium Park
    • Kingsbury St, & Kinzie 
    • Halsted St.& Fulton St.
    • N. Carpenter St. & WLake St.
    • N. Green St. & W Lake St.

Key Findings

After analyzing the data, I have identified several key insights that I would like to share. Utilizing Tableau and Excel, I created engaging visualizations to enhance both clarity and informativeness. By the conclusion of this section, you will understand the differing usage patterns of casual users and members of Cyclistic bikes.

Monthly Mean Usage of Cyclistic Customers

Monthly Mean Usage of Cyclistic Customers

As illustrated in the line chart, Cyclistic customers exhibit the highest mean bike usage from May to September, with July representing the peak period in terms of duration. This indicates that these months are the busiest for bike usage, with customers utilizing the bikes for extended periods. Conversely, the usage tends to decline significantly between December and March, marking this period as a time of reduced activity.

Number of Users per Ride Type per Day

Number of Users per Rider Type per Day

Among all riders, there is a higher proportion of members compared to casual riders. When analyzing the casual riding patterns, Saturday and Sunday account for the largest numbers, representing 48% (408,691 out of 855,106) and 45% (333,585 out of 736,193) of total riders on those days, respectively.

In contrast, the peak days for annual riders are Tuesday, Wednesday, and Thursday, indicating a decline in casual rider participation during the week.

Start and End Location of a Casual Riders

Start Location of a Casual Rider
End Location of a Casual Rider

The Geo maps reveal that a significant number of casual riders initiate their bike rentals near Lake Michigan, specifically around Grand Avenue, and conclude their rides in the same vicinity. This pattern suggests that casual riders are increasingly choosing to spend their weekends in this area.

Recommendations

Based on the findings presented, I have developed my top recommendations.
  • Implement a comprehensive marketing campaign featuring discounts and promotions during the months of June, July, and August, particularly on weekends, as this is when the majority of casual riders tend to use our services.
  • To reduce marketing expenses, I recommend utilizing banner advertisements or tarpaulins placed near Michigan Lake. Distributing flyers is not advisable.
  • Identify whether casual riders are tourists or residents. Non-tourists represent a potential pool of membership candidates, so focusing on this demographic may increase our chances of converting them into members.
  • It is essential to establish a reasonable pricing structure that differentiates between and member riders.

Google Data Analytics: 8-Course Series in Data Analytics

Prepare to embark on a new career in the rapidly growing field of data analytics, with no prior experience or degree required. This professional training, designed by Google, offers you the opportunity to connect with leading employers. Currently, there are 483,000 open positions in data analytics, with a median entry-level salary of $92,000 (as of this blog post date).

Post a Comment

0 Comments