API Code Structuring

I have been writing APIs for many years now and whether it be for simple home projects or high-throughput production systems, I keep coming back to similar patterns when designing API endpoints. Here I will walk through what I have found to be good practices for structuring API code.

For this blog, our endpoint will be for a user looking to post a blog article, something like below:

def create_blog(request: HttpRequest) -> HttpResponse:
    # 1. Permission checks
    # 2. In-memory variable validation
    # 3. Database-level validation
    # 4. Database changes
    # 5. Response formatting

Note also: this is all intended to be pseudo-code, not working production code. It's very Django-heavy as Django is the framework I have the most experience with. It's just intended a guide, not gospel.

1. Permissions

Before we do anything, we check if this user has the right to access this endpoint. If not, we reject the request immediately.

def create_blog(request: HttpRequest) -> HttpResponse:

    # permissions checks
    if not request.session or request.session.get('user') is None:
        return HttpResponse(status=401)

    if not request.session.user.has_permission('create_blog'):
        return HttpResponse(status=403)

    # continue with normal processing
    ...

Don't provide them with any debugging information as this could prove useful to malicious actors. Provide a generic 401 / 403 response and leave it at that.

In Python, decorators or middlewares are common ways to implement permissions validation. Decorators allow us to target exactly which endpoints have the validation applied, while middlewares target every endpoint (including static endpoints).

Examples are given below:

Via Decorator:

def permissions_decorator():
    def outer(func: Callable):
        def inner(request: HttpRequest, *args, **kwargs):

            # Perform check(s) - simple example here
            if not request.session or request.session.get('user') is None:
                return HttpResponse(status=401)

            if not request.session.user.has_permission('create_blog'):
                return HttpResponse(status=403)

            # checks complete, all ok - return normal response
            return func(request, *args, **kwargs)

        return inner
    return outer

# How to use
@permissions_decorator()
def create_blog(request: HttpRequest) -> HttpResponse:
    ...

Via Middleware:

class PermissionMiddleware:
    def __init__(self, get_response: Callable):
        self.get_response = get_response

    def __call__(self, request: HttpRequest) -> HttpResponse:

        # check permissions
        if not request.session or request.session.get('user') is None:
            return HttpResponse(status=401)

        if not request.session.user.has_permission('create_blog'):
            return HttpResponse(status=403)

        # continue with normal processing
        return self.get_response(request)

Middlewares run automatically for every request. Different frameworks run middlewares in different ways.

2. Input Validation

Next, we validate the input values in-memory. If anything requires checking the database, it should be left until after this block. If there are any issues in the user's inputs, we should report them before hitting the database. This way, we can return the request faster but we also protect our database resources. If we perform database checks too early, we could expose ourselves to DOS attacks.

Another note here is that if there are issues with multiple inputs, we should report all of them at once, rather than one-at-a-time. This provides a much better user experience and allows the user to correct their inputs once, rather than repeatedly getting rejected for reason A, then reason B, then reason C etc.

def create_blog(request: HttpRequest) -> HttpResponse:
    # permissions checks
    ...

    # begin in-memory input validation
    errors = {}
    title = request.POST.get('title')
    if not title:
        errors['title'] = 'Title is required'
    elif len(title) > 255:
        errors['title'] = 'Title must be less than 255 characters'

    content = request.POST.get('content')
    if not content:
        errors['content'] = 'Content is required'

    # end of validation block - report if any errors
    if errors:
        return JsonResponse({'errors': errors}, status=400)

    # continue with normal processing
    ...

Now, if I provide an empty title and content, I get the following response:

{
    "errors": {
        "title": "Title is required",
        "content": "Content is required"
    }
}

Knowing everything that needs fixing at the same time leads to a much better user experience, rather than fixing errors one at a time and then immediately failing the next check. This is especially true if the process to submit a request is not automated e.g. the client manually configuring inputs via a GUI for each attempt.

3. Database Validation

Database-level checks are an order of magnitude slower than in-memory checks, so we typically defer them until after the in-memory validation. This also ensures that our inputs to our database are cleaned / validated, and not subject to odd behaviour from bad user inputs.

Common database-level checks include:

def create_blog(request: HttpRequest) -> HttpResponse:
    # permissions checks
    ...

    # in-memory input validation
    ...

    # database-level validation
    errors = {}
    if Blog.objects.filter(user=request.session.user, title=title).exists():
        errors['title'] = 'You already have a blog with this title'

    # use ELIF so we don't run both checks - save resources
    elif Blog.objects.filter(user=request.session.user, created_at__gte=timezone.now() - timedelta(hours=24)).count() >= 5:
        errors['*'] = 'You have already posted 5 blogs in the last 24 hours'

    if errors:
        return JsonResponse({'errors': errors}, status=400)

    # continue with normal processing
    ...

Contrary to the in-memory checks, we may want to report database-level errors one-at-a-time. This is because they are more expensive to run, so if we have multiple database-level checks, we can save resources by not running the remaining checks if the first one fails.

The cost of running additional in-memory checks is neglible (generally a few microseconds). However, the cost of running additional database checks can be multiple milliseconds, and it also puts additional load onto the database, which we should avoid if not necessary.

It's the developer's job to balance user experience (reporting all errors at once) against resource usage (running checks one at a time).

Note: in the second check here, the error is not related to a specific key so doesn't make sense to report under 'title' or 'content'. Instead, I typically use "*" as a top-level key. We could instead set errors = 'You have already posted 5 blogs in the last 24 hours' - python would allow it. But it's changing the fundamental type / structure of the response payload which creates an inconsistency for any downstream processing methods.

4. Database Changes

If we have no reason to reject the request, we should accept it. We make whatever changes have been asked for in the database, and we respond to the client to let them know their request has been successful.

def create_blog(request: HttpRequest) -> HttpResponse:

    # permissions checks
    ...

    # in-memory input validation
    ...

    # database-level validation
    ...

    # No reason to reject - let's create the blog
    blog = Blog.objects.create(user=request.session.user, title=title, content=content)
    status_code = 201

    # continue as normal
    ...

5. Response Formatting

Finally, with all the data we have ready, we format the response and return it to the user. This is where we would convert any database objects into JSON-serializable formats, and also where we would add any additional metadata to the response if needed.

def create_blog(request: HttpRequest) -> HttpResponse:

    # permissions checks
    ...

    # in-memory input validation
    ...

    # database-level validation
    ...

    # database changes
    ...

    # format response
    response_data = {
        'id': blog.id,
        'title': blog.title,
        'content': blog.content,
        'created_at': blog.created_at.isoformat(),  # python datetimes are not naturally JSON-serializable, so we convert to string
    }

    return JsonResponse(response_data, status=status_code)

Note: This is a simple example, where the response format is defined inside the API endpoint. More generally, we use common serialization methods that ensure any time we return a Blog object in any API response, it is always formatted the same way. This is a good practice as it ensures consistency across our API and also allows us to easily make changes to the response format in one place if needed.