SCORE Bot: Shift Left, at Scale!

10 minute read

tl;dr Vidhu Jayabalan and Laksh Raghavan present SCORE-Bot, PayPal’s light-weight, continuous code scanning tool that hooks into their CI/CD pipeline.

SCORE Bot: Shift Left, at Scale!
Vidhu Jayabalan, Security Architect, PayPal Inc.
Laksh Raghavan, Head of AppSec & Innovation, PayPal Inc.
AppSec USA 2018
abstract video

High-level Thoughts

This talk is another one in the “we built a security tool that hooks into our CI/CD pipeline” space.

Pretty much every company I speak with is either building, using an existing, or looking for a security automation platform that performs security checks every commit, which clearly demonstrates this concept as one of the core themes in the DevSecOps / security automation space.

If you’re interested in more talks like this, see slide 15 of my BSidesSF 2019 talk, where I reference four other talks about building continuous scanning tools and integrating them into the CI/CD pipeline, from Twitter, Salesforce, Netflix, and Coinbase.

In my opinion, the uniquely valuable parts of this talk are the speakers’ emphasis on methodology and data-driven decisions, A/B testing security, and focusing on security team agility.

Want the tl;dr? Jump straight to the Key Takeaways.

Talk Structure

This talk has the following structure:

  1. Background - goals for the project, PayPal’s environment, and guiding principles.
  2. SCORE-Bot Architecture - an overview of how SCORE Bot works and why.
  3. Live Demo showing SCORE Bot working end-to-end.
  4. Results - both qualitative and quantitative.
  5. Q&A

Background

At the outset of this work, the PayPal team defined their goal as the following:

Objective: Reduce the number of vulnerabilities in our products over time, by building repeatable/sustainable proactive security practices embedded within our Product Life-Cycle.

Note that they describe this is a process and journey, not point-in-time, and that the goal is to focus on changes that are not one-offs, that raise the security bar permanently over time.

PayPal’s Environment

Not including acquisitions, PayPal has 4000+ developers in the core Product Development organization.

They have custom application frameworks across a wide variety of programming languages, ranging from the old (CGI bin C++ web application to NodeJS applications.

Previously, when code reviews happened, it was mostly around style and functionality.

How do we bring security to the table during code review?

Goal: Find PayPal-specific Issues

How do we find PayPal-specific vulnerabilities and violations of internal security standards? (Not generic flaws, like XSS, CSRF, etc.)

PayPal has a number of custom frameworks, libraries, and APIs. There could be vulnerabilities in those, some of the libraries or APIs may be deprecated and they want to ensure developers use the latest version, or issues related to PayPal’s specific logging or crypto APIs.

There are also a number of code patterns that may not technically be vulnerabilities, but they are deviations from internal security standards that they’d like to flag as well.

At the time of this presentation, SCORE Bot had 25 custom checks targeting (as described by Laksh in the Q&A section):

Mostly around our own custom frameworks, libraries, APIs, logging infrastructure, stuff like that.

Challenges with Existing Solutions

SAST tools are too heavy-weight and complex to customize, and they found keeping SAST tools up to date across versions was too messy.

IDE plug-ins are messy to maintain at PayPal’s scale and complexity.

In theory, IDE plug-ins are a great idea and have worked for some companies. When they tried, however, it was incredibly complex to maintain and update at scale, given the diverse set of languages and IDEs used by their developers. Dealing with multiple versions of each IDE and plug-in compatibility issues was just too much work.

Guiding Principles

They designed SCORE Bot based on the following principles:

1. It has to be developer friendly.

Developers should not have to learn a new tool, install or configure something, or have to remember to use it. The tool must also be empathetic to developers’ workflow.

2. Solve for scale

The tool needs to a) keep up with the 100’s of prod deployments PayPal makes per day and b) handle hundreds of components across diverse programming languages and frameworks.

3. Performance is important

Developers need feedback in seconds, not minutes to hours.

4. Behavioral science

Influencing developer behavior is ultimately a people problem, so the PayPal security team wanted to make experimentation, such as A/B testing, a core feature of SCORE Bot.

When SCORE Bot finds a given vulnerability it can send out different verbiages and determine if one gets a better response from developers.

What do we show native English speakers living in the Bay Area vs engineers living in China whose first language is not English? Is there a difference? Let’s run different campaigns and see what works.

Case Study: The Wording in Security Emails Matters

The PayPal security team built a small online security training module for developers and emailed a number of developer teams about it. The clickthrough rate was abysmal.

The security team made one change in the email campaign: they added how long the online training was (only ~4 minutes). This small change significantly improved subsequent clickthrough rates! They hypothesize this is because when developers saw the original email they weren't sure how long the training would be and thus put it off. But with this email, "Hey, 4 minutes? I can do that."

5. Maximize security rule iteration speed

The PayPal security team wanted to minimize the timeline from identification of a vulnerability to when they have a rule in CI/CD that flags the issue for all new development. Ideally this whole process should be measured in minutes to hours.

Writing custom rules for traditional SAST tools and tuning them based on scan feedback takes too long.

Key Insight: Maximize Security Iteration Speed

I think this is an excellent point worth emphasizing. One of the key benefits of Agile and DevOps, and why they've de-facto killed Waterfall, is that being able to iterate rapidly is so incredibly valuable. You can ship valuable features to customers faster, receive and incorporate their feedback faster, and if a bug is discovered? No problem, we can ship a quick fix, we don't need to wait until our next quarterly release.

While security teams often focus on the important aspects you'd expect, like speed, scalability, effectiveness of the analysis used, and so forth, in my experience it's much less common for security teams to focus on maintaining and improving their agility. I think we can learn a valuable lesson from developers here.

Reflect for a moment:

  • How valuable would it be if you could notice a common code anti-pattern and then in an hour write up a quick check and roll it out to every repo such that you got coverage on every commit from now on?
  • How useful would it be to be able to write a new check, get feedback on its effectiveness, tune it to improve signal, and have multiple rounds of that feedback-driven improvement loop take minutes or at most hours, not days or weeks?
  • What if you could add in (or remove) an additional security tool into your CI/CD pipeline in an hour?
  • Is there any security automation you're not doing because you know that rolling it out or tuning it will be too time intensive or painful?

SCORE Bot Architecture

The overall SCORE Bot workflow is similar to most other tools in this space:

scorebot_architecture

  1. A developer creates a new pull request (PR).
  2. SCORE Bot is asynchronously informed about the PR via a webhook.
  3. SCORE Bot scans for a number of PayPal-specific security issues.
  4. If issues are found, SCORE Bot comments on the PR and sends the same info via email to the developer. Metrics are stored in a separate DB.
    • The comment includes what the identified issue is, its code location, a summary of the issue in the context of the code, and most importantly, a link with step by step details about how to fix it.
  5. The developer then fixes the code or files an exemption.

This approach fulfills the principles outlined above:

  • Developers receive feedback in near real-time, so they’re still in the mindset of the current PR, in contrast to receiving feedback from heavy-weight SAST tools the next day or end of the week where the developer has mostly forgotten the context and moved on.
  • Developers don’t have to manually invoke SCORE Bot or remember to do anything outside of their normal workflow, SCORE Bot seamlessly integrates with how they’re writing code already.

Enforcement with Empathy: Manual Waiver Flow

The speakers emphasize that it’s important to be empathetic, balancing security and developer experience.

When a PR with insecure code is identified by SCORE Bot, the merge PR button is disabled to encourage the developer to fix the code before merging. The developer can initiate a waiver from within the code review system to re-enable the merge button in real time, so they’ve not blocked, for example, if they need to deploy a hotfix.

However, for certain critical risk findings, the security team retains the right to block releases associated with that commit if the finding has not been addressed.

Demo (starts @ 16:30)

The speakers demo SCORE Bot end-to-end, showing how it can detect when a developer uses a PayPal-specific crypto API that has been deprecated.

Interesting to note, they’ve also created a tool, devrunner, that developers can use locally that runs CI/CD unit tests, functional tests, and security scans (SAST, DAST, SCORE Bot, software composition analysis) before they’ve created a PR.

Note: You can click on any of the images to see them full screen.

SCORE Bot Demo: Code fails check
SCORE Bot check fails because an issue was detected.
SCORE Bot Demo: Bot comment
SCORE Bot’s comments on a PR, including a nice description of what issue was detected, some background context, and how to fix it.
Create Exemption
When required, developers can create an exemption within SCORE Bot that allows them to merge anyway, for example, when fixing a P1 issue.

What was the Security ROI?

Benefits

The benefits of SCORE Bot include:

  • End-to-end visibility into PayPal-specific vulnerabilities across all repos and tech stacks.
  • Vulnerability patterns emerge, enabling security to reach out to specific dev teams and offer focused training. For example, perhaps there’s a team that needs help upgrading to the latest version of some dependency.
    • This allows the AppSec team to proactively reach out to dev teams, engage them 1:1, understand their pain points, and help them resolve it.
  • Creating a security culture - the AppSec team can’t review every PR or attend every scrum meeting, but SCORE Bot gives security a seat at the table whenever code reviews occur.
Developer Feedback
They’ve received a lot of positive feedback about SCORE Bot. Some negative feedback too, but that’s helped them improve it.

Real-world Data

SCORE Bot: Data and Results
SCORE Bot data over time

From April 2017 to August 2018 the number of C++ findings has trended downward, indicating that developers are not repeating the same mistakes. This is scaling security!

As the targeted bug classes become handled, PayPal’s AppSec team can add new checks and gradually improve the maturity of the development org over time.

Armchair theorizing: Given that the average SCORE Bot response time is < 1 second, these checks must be either comparing specific file hashes or greps, as even parsing, let alone data-flow analysis can’t be that fast.

Key Takeaways

Wrapping up: Alright, let’s summarize the key points of this talk.

SCORE Bot Architecture: SCORE Bot works by receiving async webhooks when new PRs are created and then scans the diff for PayPal-specific security issues and best practice violations. If found, SCORE-Bot comments on the PR, emails the developer, and stores metrics.

  • SCORE Bot’s comment includes what the identified issue is, its code location, a summary of the issue in the context of the code, and a link with step by step details about how to fix it.
  • SCORE Bot currently has 25 rules (that are likely greps) covering PayPal’s custom frameworks, libraries, APIs, logging infrastructure, etc.
  • Developers can file an exemption to still merge a PR when required, for example, to fix a P1.

SCORE Bot’s approach is valuable because it’s fast (developers receive feedback when they’re still in the mindset of the current PR) and automatic (SCORE Bot runs automatically within developer’s normal workflow).

PayPal chose the SCORE Bot approach over SAST tools and IDE plug-ins because SAST tools are too heavy-weight and complex to customize and maintaining IDE plug-ins for the variety of editors PayPal engineers use is not feasible.

SCORE Bot’s benefits include giving the AppSec team end-to-end visibility into PayPal-specific vulnerabilities across all repos and tech stacks, shows vulnerability patterns, enabling security to offer focused training, and helps creates a security culture by scanning every commit, keeping security top of mind.

Unique insights from this talk include:

  • A/B test security - small wording or presentation changes can cause a significant difference in developer behavior.
  • Maximize security iteration speed - build security tools and processes with an eye towards speeding up the build -> get feedback -> iterate loop.

Stay in Touch!

If you have any feedback, questions, or comments about this post, please reach out! We’d love to chat.