Abhay Bhargav
September 9, 2021

What is GraphQL? How attackers see opportunity in new API tech

We often come across concepts on working of any website, how data is populated on the website dynamically as per any specific customer. The best and efficient way to send or receive data from a server is by the use of APIs. API is the acronym for Application Programming Interface, which is a software intermediary that allows two applications to talk to each other. As an example, every time we make an online payment, the sender’s address, receiver's address, amount, and a custom message is sent to the server, where business logic is applied, and if and when the transaction goes well, is marked completed. Concerning APIs, the request is sent to the server with parameters as sender’s address, receiver's address, amount, and a custom message. In response, a success code is generated, and no backend business logic is revealed, making the whole process fast and efficient.With technology growing more advanced, SOAP XML and REST APIs have steadily been replaced by Big Data and AI/ML. Queries are triggered at the scale of thousands every second, and to fill that gap in technology, we’re seeing GraphQL come into the picture.Let’s dive deep into GraphQL and understand why security professionals need to learn all about it.

What is GraphQL?

GraphQL is an open-source query language developed by Facebook that can be used to create APIs as an alternative to REST and SOAP. It has gained popularity since its inception in 2012 due to the traditional flexibility it offers to those who create and call APIs. There are GraphQL servers and clients used in different languages.Rest APIs require the client to send multiple requests to different endpoints on the API to query data from the backend database. With GraphQL you only need to send one request to query the backend. This is a lot simpler because you don’t have to send multiple requests to the API, a single request can be used to gather all the necessary information.As new technologies emerge, so will new vulnerabilities. By default, GraphQL does not implement authentication—that’s on the developer to implement. This means GraphQL by default allows anyone to query it, and any sensitive information will be available to attackers unauthenticated.A Typical request of the GraphQL query syntax looks like this:

A typical request of the GraphQL query looks like this:

A typical response of the GraphQL query looks like this:

Core components of GraphQL queries

GraphQL has components in two parts, which further have specific usage:

  • Server-Side Components
  • Schema

The GraphQL schema is the backbone of any GraphQL server implementation. Describes the functionality found in the client applications that you connect to. We can use any programming language to create a GraphQL schema and build an interface around it.

  • Query

The GraphQL Query is used to read or fetch values where GraphQL Mutation is used to write or fetch values. In any of the cases, performance is a simple GraphQL server that can display and respond to it with data in a specific format. The most popular response format commonly used for mobile and web applications is JSON. Syntax:

query query_name{ someField }

  • Resolver

Resolver is a collection of functions that generate the answer to a GraphQL query. In simple terms, the solution acts as a GraphQL query holder. All GraphQL schema resolution work accepts four status issues as given below - fieldName:(root, args, context, info) => {result}

  • Client-Side Components
  • GraphiQL

GraphiQL is the reference implementation of this monorepo, GraphQL IDE.

  • Apollo Client

Apollo Client is the best way to use GraphQL to build client applications. The client is designed to help the developer quickly build a UI that downloads data via GraphQL and can be used with any front-end JavaScript.

How is it different from Rest API and SOAP XML?

Over the past decade, REST has become the (yet complex) standard for designing web APIs. It offers great ideas, like countless servers and systematic access to resources. However, REST APIs have shown great consistency to meet the rapidly changing needs of the customers they meet.GraphQL was upgraded to address the need for more flexibility and efficiency! It solves many of the errors and inefficiencies experienced by developers when communicating with REST APIs.A common pattern with REST APIs is to organize endpoints according to the ideas you have within your app. This is helpful because it allows the client to get all the necessary information about a particular point of view by simply reaching a consistent conclusion.The biggest drawback of this method is that it does not allow for rapid duplication in advance. With every change in the UI, there is a greater risk that more details (or less) are needed than ever before. As a result, the backend needs to be redesigned and account for new data needs. This kills productivity and severely delays the ability to incorporate user feedback into the product.With GraphQL, this problem is solved. Due to the flexible nature of GraphQL, changes on the client-side can be made without any additional work on the server. As clients can specify their exact data requirements, no backend engineer needs to make adjustments where the design and data require a pre-switch.

Most common ways to attack GraphQL queries

Inconsistent Authorization Checks

When checking for GraphQL programs, errors in the authorization concept are the most common problem. While GraphQL helps to use validated data validation, API developers are left alone to use authentication and authentication methods. Worse, the "layers" of common solutions in the GraphQL API make this extremely complicated - authorization checks should be available not only to query level solvers but also to developers who load additional data.Authorization functionality is handled directly by resolvers at the GraphQL API layerImplementing this comes with some security configuration to avoid any bug to pop up, for instanceAuthorization must be validated separately at each location.

  • Failing this can lead to exploitable authorization flaws.
  • It is the most common vulnerability as the complexity of API schema increases, the likelihood of it is more. 

REST Proxies Allow Attacks on Underlying APIs

When setting up the existing API for GraphQL clients, it is common to start the transition by using the new GraphQL interface as a proxy sub-surface above the internal REST APIs. The simplest implementation of this will have API problem areas simply “translate” applications into REST API format, and format the response in a way that the GraphQL client does not understand.For example, a user resolution (id: 1) can be implemented in the GraphQL proxy layer by applying it to the GET /api/users/1 backend API.If used unsafely, the attacker can modify the path or parameters transferred to the background API, presenting a limited form of server-side request fraud. For example, by providing ID 1/delete, the GraphQL proxy layer instead can access GET /api/users/1/delete with its credentials… a more destructive effect than originally intended.While this is not a good REST API design, similar situations are not uncommon in real-world applications, which often allow for the modification or retrieval of unintended information.

Missing Validation of Custom Scalars

When using GraphQL, the scalar type is the type used to represent raw data. Finally, data transferred as input or retrieved as data output by the API is not scalar.There are five built-in scalar types - Int, Float, Boolean, String, and ID.If the API developer uses its scalar type, they are responsible for performing any sanitization installation and type validation to be performed.This basic set of scalar types is sufficient for most simple APIs, but in cases where other raw databases are used, GraphQL includes application developer support to define their scalar types.Example: an API could include its DateTime scalar type or an expandable scalar type that provides extended input authentication.In JavaScript implementation, this is done using the parseValue and parseLiteral functions, which create an input from the JSON representation and the representation of the syntax tree from GraphQL, respectively.Safely performing these functions to reject invalid input is essential to maintaining the security of the type provided by GraphQL. The use of the same library ensures the security of the type provided by GraphQL for convenience.

Failure to Appropriately Rate-limit

The growing complexity of GraphQL APIs makes implementation limited and other performance protection easy. While with the REST API each HTTP request performs one action, the GraphQL query can take many steps for no reason, and take up a large number of illegal server resources.As a result, similar standard mitigation techniques used for REST APIs - simply limiting the number of HTTP requests received - are usually insufficient to protect the GraphQL API.One common source of high-level complex questions is a natural result based on a GraphQL specification graph. If there is a loop in the relationship between the two types of objects, it is usually possible to make short questions immediately for a complex operation.

Introspection Reveals Non-public Information

Often, it is intimidating to add “hidden” API endpoints that offer improper functionality to the general public. This can be a hidden administrative function, or an API to simplify a server in a setup connection.When accessed without proper authorization, this is not a good practice for REST-based APIs, but a GraphQL feature called introspection makes finding the hidden endpoints much easier.As part of an easy-to-use developer effort, the input feature, which is automatically enabled for most GraphQL functionality, allows API clients to vigorously ask for details about the schema, including texts and types of all queries and modifications defined in the schema.This is used by development tools, such as GraphiQL IDE, to retrieve the schema if it is not provided. When installed in a public API, logging in can greatly improve the developer experience.

Defending GraphQL queries

Securing queries is a task needed to be done at the topmost priority else it can lead to a breach of data to any extent. It is important to understand that the following points are ideal things to consider while implementing queries, a separate checklist needs to be evaluated.

1. Authentication

Before we can properly control data access, we must authenticate the user. There are many ways to verify authentication, including HTTP headers and JSON web tokens.The authentication needs for your schema may require you to put nothing more than { loggedIn: true } into context.

2. Authorization

It is important to implement because not all data is meant to be served to any request. APIs can be authorized in one of the following ways:

  • API-wide authorization:

Once we have the information about the user requesting, the most basic thing we can do is to deny them the ability to ask a question at all depending on their roles. We will start with this idle form of authorization because it is a very basic one.

We should only use this method in restricted areas that do not provide public access to the API or anything, such as an internal tool or an independent microservice that should not be disclosed to the public.

  • In resolver:

GraphQL provides the most granular control of data. On GraphQL servers, individual field resolvers can evaluate user roles and decide what to return each user.

  • With custom redirectives:

Another way to do authorization is with GraphQL Schema Directives. A directive is a character prefix @ character, voluntarily followed by a list of named arguments, which can appear after any syntax in GraphQL query or schema languages.

  • Outside GraphQL:

If you're using a REST API that has built-in authorization, like an HTTP header, you have another option. Instead of doing any authentication or authorization work in the GraphQL layer (in the resolver/model), it is possible to simply pass through headers or cookies to your REST endpoint and let it do the work.

3. Limit query depth

If we could prevent clients from abusing the depth of questions like this? Knowing your schema may give you an idea of ​​how deep a legitimate question can go. This is possible to use and is often referred to as Maximum Query Depth.By analyzing the abstract syntax tree (AST) query, the GraphQL server can reject or accept an application depending on its depth.Take for example a server configured with a Maximum Query Depth of 3, and the following question document.

4. Rate limiting APIs

Rate limiting helps us to narrow the user if the set limit of requests for each time is exceeded. These steps help your app prevent spam attacks and API queries.

  •     By user for a time frame
  •     By IP address in a time frame
  •     By a combination of above
  •     Limiting request on resolver

5. Turn off introspection in production

Leaving introspection on in production is like serving a meal with the recipe used to make it. While it may be possible for bad actors to learn to write malicious questions by re-engineering your GraphQL API with a lot of effort and error, disabling self-testing is an invisible security feature.

Case Studies #1

URL: #489146 Confidential data of users and limited metadata of programs and reports accessible via GraphQLVictim Organization: Hacker OneDate of Incident reported: 2019-01-31Attack issue: Confidential data of users and limited metadata of programs and reports accessible via GraphQLBounty awarded: US$20,000Detail explanation:Hacker One confirmed that two researchers were able to get confidential information from the final point in GraphQL. This vulnerability was carried out first on December 17th, 2018, and was caused due to a move back to the class-based implementation of GraphQL types, conversions, and connections.When a GraphQL query is redesigned and decompiled into one or more SQL queries, it throws its effect into the old list and uses annotation-level authorization to scrape all the data the current user is not authorized to see. Analysis of the root causes showed that this method of code was only followed when nodes were asked about the edge field. Preventative Measure:

  1. Consider using GraphQL hooks for built-in authorization reductions to catch more cases.
  2. Break the flow when an unexpected object is returned to resolve the connection field.
  3. Reduce the complexity of connection type resolution.

Case Studies #2

URL: #885539 Private list members disclosure via GraphQLVictim Organization: TwitterDate of Incident reported: 2020-05-29Attack issue: By chaining the timing attack and broken rate limit with a vulnerable GraphQL query, it was possible to read members of the private list via GraphQL.Bounty awarded: US$20,000Detail explanation:Twitter uses a separate GraphQL endpoint, which can only use questions defined by Twitter. However, there is an error in the backend. This danger requires the power of ice, snowflake brute force but it is not possible. Snowflake is made up of a timestamp, a sequence id, and a work id. The id sequence is a0 because it will automatically reset at the beginning of every millisecond. Worker id can have 2 ^ 10 = 1024 different values.So, you need to send 1024 1000 = 1024000 applications to make a strong snowflake that makes for a second.1024000 60 = 61440000 applications will be enough to force a list created in a minute.Impact:Leakage of private list members.