Tracking down the Cause of Internal Server Error with AWS HTTP API Gateway

Recently I have been migrating the API of one of my original side projects, Ultimate Fantasy Supercross, to be 100% serverless. The API has moved over to .NET Core 3.1 and now it’s down to porting each endpoint into Lambda functions.

To ease the deployment pain I chose to make use of the dotnet Lambda tools provided by AWS. The deploy-severless functionality allows me to represent a collection of endpoints as CloudFormation in one project inside of a serverless.template file. Then to deploy it, I run dotnet lambda deploy-serverless and the entire application gets deployed.

My first endpoint was a simple GET endpoint. It had a custom authorizer on it that integrates with my Auth0. The custom authorizer function was already deployed and deemed working. My endpoint inside my serverless.template looked like this:

{
  "AWSTemplateFormatVersion": "2010-09-09",
  "Transform": "AWS::Serverless-2016-10-31",
  "Description": "An AWS Serverless Application.",
  "Resources": {
    "LeaguesApi": {
      "Type": "AWS::Serverless::HttpApi",
      "Properties": {
        "AccessLogSettings": {
          "DestinationArn": "arn:aws:logs:us-west-2:249704159252:log-group:http-api-gateway-access-logs",
          "Format": "$context.identity.sourceIp - - [$context.requestTime] context.httpMethod $context.routeKey $context.protocol $context.status $context.responseLength $context.requestId"
        },
        "Auth": {
          "Authorizers": {
            "Auth0LambdaArn": {
              "EnableSimpleResponses": true,
              "AuthorizerPayloadFormatVersion": "2.0",
              "FunctionArn": "arn:aws:lambda:us-west-2:249704159252:function:ufsx-auth0-authorizer-dev-auth"
            }
          }
        },
        "DefaultRouteSettings": {
          "DetailedMetricsEnabled": true
        },
        "StageName": "prod"
      }
    },
    "GetLeagues": {
      "Type": "AWS::Serverless::Function",
      "Properties": {
        "Handler": "UFSX.Api.Leagues::UFSX.Api.Leagues.Functions::Get",
        "Runtime": "dotnetcore3.1",
        "CodeUri": "",
        "MemorySize": 1024,
        "Timeout": 30,
        "Role": "arn:aws:iam::<account-id>:role/lambda-database-with-put-event-bus",
        "Policies": ["AWSLambdaFullAccess"],
        "VpcConfig": {
          "SecurityGroupIds": ["sg-12345678"],
          "SubnetIds": ["subnet-8746aa78", "subnet-428agdfg"]
        },
        "Events": {
          "RootGet": {
            "Type": "HttpApi",
            "Properties": {
              "Path": "/leagues",
              "Method": "GET",
              "Auth": {
                "Authorizer": "Auth0LambdaArn"
              },
              "ApiId": {
                "Ref": "LeaguesApi"
              }
            }
          }
        }
      }
    }
  }
}

It looks like there is a lot happening here but that’s just the verbosity of CloudFormation. What we have here is an HTTP API Gateway defined called LeaguesApi. This is the HTTP API approach provided by AWS which provides a cheaper and more performant alternative to their original REST API. Inside of the API block, we enabled access logging to a CloudWatch log group. We also defined our Custom Authorizer with the ARN of the deployed function.

Further down we define our first function, GetLeagues. It has some VPC configuration for talking to an RDS database as well as an Events block. Inside that block, we define the API path this function listens to. There is also the authorizer that sits in front of it and the HTTP API Gateway it is connected to.

With that template setup and a basic Hello World running inside of the function, I deployed it.

Why do I get an Internal Server Error when I invoke an HTTP API Gateway endpoint?

Shortly after deploying it, I tried hitting my new endpoint with the correct Auth0 token.

{"message": "Internal Server Error"}

That’s odd, my Lambda function is just writing to the logs Hello World. So I went to the CloudWatch logs for both the authorizer function and the function supporting my API. But looking at the logs for both Lambda functions revealed that neither one had been invoked 🤔

The troubleshooting docs tell you to enable access logs for your API. Which we did up in the original serverless.template with this line.

"Format": "$context.identity.sourceIp - - [$context.requestTime] context.httpMethod $context.routeKey $context.protocol $context.status $context.responseLength $context.requestId"

But according to the troubleshooting docs, we should add $context.integrationErrorMessage to our access logs format. This allows us to see what/if there is an integration error. An integration error would indicate that our API Gateway endpoint is not able to call our Lambda function. So I updated my access logs format to add the extra variable.

"Format": "$context.identity.sourceIp - - [$context.requestTime] context.httpMethod $context.routeKey $context.protocol $context.status $context.responseLength $context.requestId $context.integrationErrorMessage"

I hit my endpoint again after deploying the change and went straight to the access logs. There was the 500 but there was no integration error message. What is going on?

2020-12-18T10:07:44.913-08:00   8.45.151.32 - - [18/Dec/2020:18:07:44 +0000] "GET GET /leagues HTTP/1.1" 500 35 XwptqhuYPHcESYw= -

The troubleshooting docs tell you that the common reason for this error has to do with the fact that the resource-based policy on the Lambda function that is being called may not have granted the API Gateway the lambda:InvokeFunction permission. The resource-based policy on the Lambda function defines who/what can invoke this function.

So I checked the resource-based policy on the Lambda function. It looked absolutely flawless, unsurprisingly.

{
  "Version": "2012-10-17",
  "Id": "default",
  "Statement": [
    {
      "Sid": "My-API-LambdaRootGetPermission-15CAEMBXBCPZF",
      "Effect": "Allow",
      "Principal": {
        "Service": "apigateway.amazonaws.com"
      },
      "Action": "lambda:InvokeFunction",
      "Resource": "arn:aws:lambda:us-west-2:<account-id>:function:My-API-Lambda-EUJ1YGFN0R2W",
      "Condition": {
        "ArnLike": {
          "AWS:SourceArn": "arn:aws:execute-api:us-west-2:<account-id>:by6a0ehwm6/*/GET/myendpoint"
        }
      }
    }
  ]
}

OK, so that all looked alright. So why the Internal Server Error?

Remember, there are actually two functions in play here. There is the Custom Authorizer function that sits in front of the endpoint. It authorizes the request by checking the token with Auth0. The second function is the actual API function that executes logic. We confirmed that the second function doesn’t appear to have any permission issues. But remember, we have no logs coming from either function. So neither function is being invoked by the API Gateway.

The keyword here is invoked. Neither function has been invoked and thus there is no logs.

The Custom Authorizer function sits in front of the actual API function. So the reason for the Internal Server Error must live within the Custom Authorizer. Furthermore, because it hasn’t been invoked I can deduce that the problem is likely the resource-based policy on the Custom Authorizer.

It turns out, for my scenario, the Internal Server Error was because there was no resource-based policy on the Custom Authorizer that allowed the API Gateway endpoint to invoke it.

The Custom Authorizer is a separate function. So it was deployed as a standalone function, not hooked up to any API Gateway endpoint directly. Because of this fact it never got any kind of resource-based policy. The solution to my specific problem was to add the following resource-based policy to my Custom Authorizer function.

{
  "Version": "2012-10-17",
  "Id": "default",
  "Statement": [
    {
      "Sid": "CustomAuthorizerAPIGateway15CAEMBXBCPZF",
      "Effect": "Allow",
      "Principal": {
        "Service": "apigateway.amazonaws.com"
      },
      "Action": "lambda:InvokeFunction",
      "Resource": "arn:aws:lambda:us-west-2:<account-id>:function:my-custom-authorizer"
    }
  ]
}

This bit of CloudFormation grants API Gateway the ability to invoke my Custom Authorizer function. This allows me to develop many APIs and each of them can use this common authorizer without running into this permission problem again.

Conclusion

This was a bit tricky to debug on the fly. The reason being is because of the error you are likely Googling when you encounter this. If you look up API Gateway Internal Server Error you will most likely end up on these troubleshooting docs I linked to earlier. But these docs are assuming that your API Gateway endpoint is directly linked to the Lambda function (i.e. there is no custom authorizer in the middle).

These are valid docs for any situation including ours. But those docs won’t show you the error happening with your authorizer. That variable only shows you the error with the Lambda function the API endpoint is trying to call. $context.integrationErrorMessage shows you the integration error but it won’t show you any errors up before that. When you have a custom authorizer involved you should also add $context.authorizer.error to your access logs format as mentioned at the very bottom of this documentation.

In hindsight, this all seems obvious. But during the moment it felt a lot like trying to find a needle in a haystack. Hopefully, this post helps point you in the right direction if you encounter something similar getting an HTTP API endpoint working with a custom authorizer in the middle.