June 2023
The Los Angeles Police Department (LAPD, the Dept.) has faced severe scrutiny over the years, and criticisms reignited after 2020, when hundreds of cases were reviewed after finding 3 officers were involved in misconduct with those cases. The allegations and frustrations from the Los Angeles community, in congruence with Mayor Garcetti's Executive Directive 27, empowered the LAPD to establish a Diversity, Equity, and Inclusion Officer (DEIO), who was responsible for establishing DEI initiatives within the Dept. The initial roadmap of initiatives included:
Data Analysis – The Department is committed to evaluating its own data practices to increase accuracy related to data reporting. Accurate data will allow for changes within the Department based on Departmental initiatives. It is vital that the Department continues to evaluate data in order to provide the most professional law enforcement service available.
Since laying out the DEI plan in October 2021, the Dept. has provided additional data to data.gov on its crime reports. It's now March 2023, and the Chief of Police is recommitting to the Dept.'s goal of building and fostering a culture that respects the humanity of all people. In an effort to support the transparency and visibility of DEI initiatives, the Chief has hired a consulting team of data analysts to review the data released by the Dept., and has asked them to deliver a report detailing the following:
A review of current data sources: Crime Data from 2020 to Present and Arrest Data from 2020 to Present - note data cleaning tool(s) used.
Analysis and visualizations in trends of reported data, and correlation, if any, to current DEI initiatives.
A review of current DEI initiatives, and after reviewing these and the data, provide:
At least 3 recommendations for improving data collection and transparency in support of DEI
Recommended next course of action
For this project, I wanted to get in a little more practice with SQL specifically and try out PowerBI, as I had not worked with this platform before this case study. I used the following data sets, made available with this license, through LA City's data website:
Crime Data from 2020 to Present - crimes reported from 2020 to present
Arrest Data from 2020 to Present - arrests made (bookings & RFCs)
For this project:
My team lead has assigned me the task of reviewing both data sets, but has asked that I focus on Crimes Reported 2020 to Present, and provide some insights into trends, and any thoughts I have re: DEI recommendations to review with the team before providing to the Chief.
I used SQL to clean and analyze the data. This was my first unguided case study, so I had a lot of #learningmoments. Below are the steps I took, and some of the queries I used. Full doc of queries can be found on Github.
To begin cleaning and analyzing the data, I first reviewed the spreadsheets in a text editor (Notepad++) to get a high-level visual of how the data was structured, how many rows I should anticipate working with, and proceeded to upload the files to BigQuery. After glancing through the documentation, I made some initial observations/questions to guide my cleaning and analysis process. Those questions, queries, and results are below:
Are the report_ids unique? Yes, all values are unique.
SELECT DISTINCT Report_ID
FROM `lapd-data.arrests_and_crime_reports_2020_present.arrests_2020_present`
How many report types are there? 2 types: booking & RFC.
Booking - PD registers and enters charges against a person believed to have violated the law
RFC - Release From Custody - officer charges a person and issues an order to appear in court
SELECT DISTINCT Report_Type
FROM `lapd-data.arrests_and_crime_reports_2020_present.arrests_2020_present
Total # of bookings vs RFCs?
162323 bookings
SELECT COUNT (Report_Type)
FROM `lapd-data.arrests_and_crime_reports_2020_present.arrests_2020_present`
WHERE Report_Type = 'BOOKING'
40881 RFC's
SELECT COUNT (Report_Type)
FROM `lapd-data.arrests_and_crime_reports_2020_present.arrests_2020_present`
WHERE Report_Type = 'RFC'
How many unique charge types (descriptions) are there? 27 unique (excluding "null")
SELECT DISTINCT Charge_Group_Description
FROM `lapd-data.arrests_and_crime_reports_2020_present.arrests_2020_present`
WHERE Charge_Group_Description <> 'null'
How many unique charge codes are there? 3390 unique charge codes found
SELECT DISTINCT Charge
FROM `lapd-data.arrests_and_crime_reports_2020_present.arrests_2020_present`
Do the DR_NO values from arrest data, and report_ids from crime data correlate? These numbers do not correlate. No results.
SELECT Report_ID
FROM ((`lapd-data.arrests_and_crime_reports_2020_present.arrests_2020_present`
INNER JOIN `lapd-data.arrests_and_crime_reports_2020_present.crime_2020_present_1`
ON `lapd-data.arrests_and_crime_reports_2020_present.arrests_2020_present`.Report_ID = `lapd-data.arrests_and_crime_reports_2020_present.crime_2020_present_1`.DR_NO)
INNER JOIN `lapd-data.arrests_and_crime_reports_2020_present.crime_2020_present_2`
ON `lapd-data.arrests_and_crime_reports_2020_present.arrests_2020_present`.Report_ID =`lapd-data.arrests_and_crime_reports_2020_present.crime_2020_present_2`.DR_NO)
Recombine tables (had to split original csv file in 2 to upload to BigQuery)
SELECT *
FROM `lapd-data.arrests_and_crime_reports_2020_present.crime_2020_present_1`
UNION ALL
SELECT *
FROM `lapd-data.arrests_and_crime_reports_2020_present.crime_2020_present_2`
Are the DR_NO values unique? Yes, these values are unique. I wanted practice with nesting queries, so ran both of the below:
SELECT DISTINCT DR_NO
FROM (
SELECT DR_NO FROM `lapd-data.arrests_and_crime_reports_2020_present.crime_2020_present_2`
UNION ALL
SELECT DR_NO FROM `lapd-data.arrests_and_crime_reports_2020_present.crime_2020_present_1` )
SELECT DISTINCT DR_NO
FROM `lapd-data.arrests_and_crime_reports_2020_present.crime_combined`
How many unique crime descriptions are there? 138 descriptions
SELECT DISTINCT Crm_Cd_Desc
FROM `lapd-data.arrests_and_crime_reports_2020_present.crime_combined`
Difference in time between when an incident occurred vs how long after it was reported?
Added a new calculated column reporting_timeframe to find the difference between when an incident occurred vs how long after it was reported:
ALTER TABLE `lapd-data.arrests_and_crime_reports_2020_present.crime_combined`
ADD COLUMN `reporting_timeframe` INTEGER;
ALTER TABLE `lapd-data.arrests_and_crime_reports_2020_present.crime_combined`
ALTER COLUMN reporting_timeframe SET DATA TYPE NUMERIC;
UPDATE `lapd-data.arrests_and_crime_reports_2020_present.crime_combined`
SET reporting_timeframe = date_diff(Date_Rptd, DATE_OCC, DAY)
WHERE TRUE
Functional analysis of this reporting time frame field
Calculating the MIN
SELECT MIN(reporting_timeframe)
FROM `lapd-data.arrests_and_crime_reports_2020_present.crime_combined`
Result: 0
Calculating the MAX
SELECT MAX(reporting_timeframe)
FROM `lapd-data.arrests_and_crime_reports_2020_present.crime_combined`
Result: 1172
Calculating the AVG
SELECT AVG(reporting_timeframe)
FROM `lapd-data.arrests_and_crime_reports_2020_present.crime_combined`
Result: 9.673277625
Calculating the MODE
Mode = most frequent recurring number
SELECT reporting_timeframe, COUNT(1) AS frequency
FROM `lapd-data.arrests_and_crime_reports_2020_present.crime_combined`
WHERE reporting_timeframe IS NOT NULL
GROUP BY reporting_timeframe
ORDER BY frequency DESC
Result: 0 day (same day) reporting @ 339905 instances.
~50% of the time it's reported same day, ~21% reported within 1 day
Made note of results, and saved updated table as CSV with the reporting_timeframe column included, for further analysis in PowerBI.
To better see relationships in the data, I used PowerBI. Below are the visualizations, and the insights I gained. These visuals focus on the Crime Data 2020 to Present dataset.
TOP: ~686K crimes were reported from Jan 01, 2020 to Mar 23, 2023, with lower numbers through 2020 - 2021 (assuming because of the Global Pandemic/Lockdown). Crime rates have been on the rise since the city/state/country reopened more fully in 2021.
BOTTOM: Breakdown of crimes reported in percent. Stolen vehicles, battery, identity theft, vehicular burglary, and vandalism seem to plague the city at the highest levels, with these 5 categories making up 38% of all crimes reported.
I next reviewed reported crimes by victim sex (Left) and race/ethnicity (Right). There are more Male victims of crimes than Female, but the stats leave room for improvement, as *LAPD only officially recognizes 2 sexes/gender identities, and had erroneous indicators in the data sets. Erroneous and blank entries were removed from chart at left.
The stats for race/ethnicity leave room for improvement as well, but more to come in the Recommendations section of this report. For the pie chart at right, I excluded blank entries. We see a breakdown of victims being primarily of Latine and White backgrounds (59.16% of all victims), which makes sense considering these are the 2 largest population groups in LA. These stats seem to align with LA city demographics in terms of makeup. Something interesting to note is that while White and Latine folks make up a majority of LA population, they are victims of crime in percentages less than their population makeup (i.e., Latine/Hispanic folks make up 44% of LA population, but are victims of crime 35% of the time, White folks make up 28% of the population, but are victims of crime 24% of the time). This is not the case for Black residents in LA. While individuals who identify as Black make up only 8% of the LA county population, they are victims of crime 17% of the time. That's more than double the amount of other racial/ethnic groups.
I used this Tree Map to get a better visual breakdown of the most prominent crime types, by victims' sex. The big overall colored box represents the crime (the bigger the box, the more reported the crime), and is divided into 2 smaller boxes to show the amounts that each sex reports being the victim of that crime, and at what percent.
We saw earlier that men tend to be more likely to report being a victim of crime, so this map shows us that most crimes have higher male victim percentages.
This is not the case for crimes like: simple intimate partner assault, aggravated intimate partner assault, violations of court and restraining orders, person theft, lewd letters & telephone calls, rape, child abuse, crimes against children, indecent exposure, sexual penetration with a foreign object, kidnapping, stalking, battery with sexual contact, and still, more, where women/girls are the victims at much higher percentage rates (2-4x more often for most of these crimes).
The data does not give the full story, as not all victims (M, F, or other) come forward about crimes committed against them. Additionally, because LAPD only recognizes 2 official sexes/genders, we do not have stats for crimes against individuals who identify as non-binary, trans, intersex, etc.
Another visualization of the same information above, but with high level stats for each assault type. For this chart, I included the previously excluded erroneous entries, and I noticed that crimes documented with an X, H, or no indicator were more often than not, crimes against a business (theft, embezzlement, arson, etc.). I would suggest we still gather the data of the business owner, but add an additional check box for "Business" to know if the crime was against a person, their place of work, or potentially both (hate crimes).
I next created a tree map of crime victims by race/ethnicity, and reported crime type. White populations tend to report being victims of physical theft crimes more than any other population (types of theft include: plain, petty, burgarly, vehicular, grand-theft auto, etc.). Latine and Black populations tend to be largest victims of assault (simple and aggravated), criminal threats, intimate partner assault, forcible rape, attempted robbery, and criminal threats. I would recommend reviewing how the dept captures data about race and ethnicity. The AAPI community is made up of many of the nationalities listed here, and it's the only ethnic group where there are many nationalities accounted for individually. The dept should work with the community to either: individualize every potential nationality, or, match the standards set forth by the US Census, so data may be more easily correlated with census data provided.
Finally, I reviewed how long on average it takes for F vs M victims to report crimes. Men are more likely to report crimes within 9 days of the crime occurring, while F victims are more likely to report within 12 days. This may speak to the reluctance of F victims to come forward about sexual assault or other abuses, but further research is needed. Also, because M & F are the only official recognized genders by LAPD, we do not have statistics specifically against trans or non-binary identifying individuals.
I also reviewed how long on average it takes to report each type of crime by race/ethnicity. Some of these stats may be skewed due to the number of reports for each category (i.e., Guamanian victims accounted for less than 1% of all crimes reported, so they have a smaller pool of victims). Focusing on White, Black, and Latine populations specifically, they tend to average in the 9 - 12 day range of reporting for each population group.
LAPD has done significant work in becoming more transparent with some of its reporting. The data shows us some interesting trends, such as:
Women tend to be victims of intimate partner crimes and sexual assault.
Black folks tend to have higher victim rates per population in LA, compared to White and Latine folks.
Stolen Vehicles, Assault/Battery, Vandalism, Identity Theft, and Burglary are the are the most commonly reported crimes in LA.
Most of the trends we saw correlate to state and national statistics, so should not be too surprising. To conclude this analysis, LAPD should shift focus from what the data currently reports, to what it doesn't. The data gives us some information about victims of crimes, but this data could be enhanced by doing the following:
Change Sex to Gender: assigned-at-birth sex is not something officers can judge on the spot, and if it's the only data collected, may exclude or minimize crimes against non-binary or transgender persons. Gender should be self-disclosed by the victim in question, and documented according to the individual.
Allow for 3 Gender options: Man, Woman, Non-binary. There should also be an optional checkbox above the gender question for "Business." Businesses/business owners who are victims of crimes should still have their gender (and race/ethnic) identities documented, but should note if they are reporting on behalf of the business as a whole.
Collect data on sexual orientation: We do not have any stats that indicate crimes against members of the LGBTQIA+ community. Could potentially have a checkbox - "Identifies as LGBTQIA+"
Change race/ethnicity options: These options should better match the US Census. While LA is a diverse place, it's unbalanced and much harder to reconcile multiple options for some communities, and generalizations of others. Having citizens choose between US Census recognized options (including "more than one race"), will allow LAPD to better identify trends among populations that better correlate to the numbers of those populations in LA.
Additional reporting details on officers in Arrests Data: LAPD has faced a lot of scrutiny for lack of transparency and bias amongst its staff. One major part of DEI work is taking a look at the current practices of the force. It would behoove us to add data on the arresting officer (and their race & gender) when it comes to arrests, in particular. There were not many trends to be found in the arrest data (I just happened to take a peak), but adding this kind of information would be a major benefit to the force in cracking down on unconscious bias in the ranks.
Digital reporting: officers should be required to fill out all police reports digitally. One of the notes for all data provided by LAPD is that the data is transcribed from hand-written documentation. There should be very little room for error when reporting things like gender, race, etc., so I suggest allowing officers to take notes manually on the spot if that's easier, but to file all official reports digitally themselves (so they can read their own writing), and limit the choices they can make when it comes to gender and race options.
My recommended next course of action is to make these few adjustments to what data is collected (and how!). These few changes have the potential to make a lasting impact with the LA community. When it comes to DEI practices in policing, initiatives should go far beyond hiring practices and recognizing cultural holidays (both of which are great!), and should include: challenging and dismantling implicit bias and profiling; assessing current police practices and providing data on things like excessive use of force; being transparent with the public about this type of data; and, changing policies (including policies concerning transparency, culture, training, and practice). How officers view DEI can correlate to their implicit bias while on the job, and how they manage those tough, ethical decisions they face every day. If we start with the data, we can analyze a strong pathway to a culture of DEI at LAPD.
Back to Data Projects Showcase