Coder Thoughts on software, technology and programming.

Piotr Mionskowski

  • Request timeouts in Spring MVC

    21 November 2017

    Last time we reviewed how to configure HTTP client timeouts. This time let us focus on the other side of the HTTP request i.e. server. There is pretty much always a thread pool involved when we write a Spring MVC application. The thread pool configuration will vary depending on particular servlet container (Tomcat, Undertow, Jetty) so we have to watch out for subtle differences. However, most if not all of them will use a thread pool with fixed maximum size. As we already know, a pool of resources might get exhausted. In particular, a thread pool is more likely to get exhausted if we do not control timeouts diligently.

    Threads involved in a Spring MVC request handling

    A typical servlet container will use one or more thread pools to handle a request. In particular one of the thread pools is used to execute the Spring MVC part of request handling. Let us call this thread pool the request worker thread pool. The request worker thread pool will have a default maximum size:

    • Tomcat: server.tomcat.max-threads controlling maxThreads with a default of 200
    • Undertow: server.undertow.worker-threads controlling WORKER_TASK_CORE_THREADS with a default of availableProcessors() * 8
    • Jetty: There is no Spring configuration property available currently. One can customize the Jetty Thread Pool through code and jetty specific configuration though. The default maximum number of worker threads is 200.

    Thread pool

    What happens when the request processing thread pool is empty?

    Once the request processing thread pool is empty, the servlet container will typically queue the requests. The queue is processed by the request processing thread pool. Queueing up requests is consuming server memory and sockets thus there typically is a limit after which a new incoming request is going to be immediately rejected.

    • Tomcat: server.tomcat.accept-count Maximum queue length for incoming connection requests when all possible request processing threads are in use. The default value is 100.
    • Undertow: As far as I can tell by default the requests will be queued and the only bound is system capacity. There is Request Limiting Handler available though that allows configuring maximum concurrent requests as well as maximum queue size.
    • Jetty: By default will queue requests using unbounded queue. You can configure it though as documented:
    <Configure id="Server" class="org.eclipse.jetty.server.Server">
        <Set name="ThreadPool">
          <New class="org.eclipse.jetty.util.thread.QueuedThreadPool">
            <!-- specify a bounded queue -->
            <Arg>
               <New class="java.util.concurrent.ArrayBlockingQueue">
                  <Arg type="int">6000</Arg>
               </New>
          </Arg>
            <Set name="minThreads">10</Set>
            <Set name="maxThreads">200</Set>
            <Set name="detailedDump">false</Set>
          </New>
        </Set>
    </Configure>
    

    Queuing requests is necessary in the most commons scenarios to handle temporary spikes in load. For example, if your application can handle 100 requests per second, and if you can allow it to recover from one minute of excessive high load, you can set the queue capacity to 60*100=6000.

    Let us assume that the thread pool (max) size is 100 and that on average a request takes 1 second to process. In such server we can thus handle 100 requests per second (rps). Any requests over the rps limit is going to be queued. Now imagine we have a single type of request that for some reason takes much longer to process than usual e.g. 120 seconds due to some dependent service issue. When such request is processed, it will first block subsequent requesting processing threads until all of them are busy waiting. Depending on the limit of queue size and system capacity our server will soon start rejecting all new requests. It is worth noting that the slow requests are also going to be put in queue after thread pool capacity is reached.

    One of the ways to mitigate the issue and speed up system recovery is to apply timeouts. When a timeout for a particular request elapses ideally few things should happen:

    • the client should be notified about the error (503, 504 or 408 depending on the use case)
    • the request should be removed from the processing queue
    • the thread processing the requests should be interrupted

    Let’s review what options are available.

    • Tomcat has Stuck Thread Detection Valve:

      This valve allows to detect requests that take a long time to process, which might indicate that the thread that is processing it is stuck. Additionally it can optionally interrupt such threads to try and unblock them. The valve has 2 configuration options:

    • threshold: Minimum duration in seconds after which a thread is considered stuck. Default is 600 seconds.
    • interruptThreadThreshold: Minimum duration in seconds after which a stuck thread should be interrupted to attempt to “free” it.

    AFAIK the valve only applies to requests that did start processing by the thread pool.

    • Undertow and Jetty do not allow for setting a request timeout directly. They both do have idle connection detection and can timeout it accordingly. Unfortunately since HTTP/2 multiplexing the timeouts options may not be suitable to timeout a single request.

    • In Spring MVC there is no way to configure a timeout unless you use async method. With async method one can use spring.mvc.async.request-timeout= to set amount of time (in milliseconds) before asynchronous request handling times out. However, using Async Servlet with Spring MVC requires changing the controller methods return types.

    There is no standard request timeout configuration

    There are only couple of options available to set encompass request handling with a timeout. This is partially due to historical reasons. The servlet container specification did not consider timeouts until Async Servlet was defined. Another reason is that the there is no way to safely stop a thread that a framework could use. The application code needs to cooperate to safely terminate the request handling execution. In the next post we will show how to add a request timeout to a Spring MVC application.

  • HTTP client timeouts

    05 November 2017

    We have already touched upon the importance of timeouts and described most important related JDBC knobs. The next aspect of timeouts I would like to focus on is using API clients. Specifically HTTP clients which are by far the most popular. We will review couple of popular HTTP client libraries and their configuration regarding timeouts.

    Waiting

    HttpURLConnection timeouts

    HttpURLConnection available since JDK 1.1 has gained the ability to timeout its network communication in version JDK 5. The 2 available timeouts setConnectionTimeout, setReadTimeout control how long to wait until connection is established and how long to wait for a data from the server respectively. The default values are infinite ‼️.

    Apache HttpClient timeouts

    HttpClient from Apache HttpComponents suite has been a standard choice for http communication. It is a mature project, with rich API that fills many HttpURLConnection shortcomings e.g. connection pooling. Many of the APIs have been deprecated e.g. DefaultHttpClient, org.apache.http.params.CoreConnectionPNames hence one needs to be careful when setting the timeouts they fallback to system defined socket level defaults.

    There are 3 timeouts settings available:

    val requestConfig = RequestConfig.custom()
        // Determines the timeout in milliseconds until a connection is established.
        .setConnectTimeout(5_000) 
        // Defines the socket timeout in milliseconds,
        // which is the timeout for waiting for data or, put differently,
        // a maximum period inactivity between two consecutive data packets).
        .setSocketTimeout(5_000)
        // Returns the timeout in milliseconds used when requesting a connection
        // from the connection manager.
        .setConnectionRequestTimeout(2_000)
        .build()
    

    The requestConfig can be further used as a default for an HttpClient instance:

    val httpClient = HttpClients.custom()
        .setDefaultRequestConfig(requestConfig)
        .build()
    

    It is also possible to configure each request separately:

    val get = HttpGet("http://httpbin.org/get").apply { 
        config = requestConfig
    }
    httpClient.execute(get)
    

    OkHttp

    OkHttp is my favorite HTTP & HTTP/2 client for Android and Java applications. It is efficient and has good configuration defaults. There are 3 timeout settings available:

    val client = OkHttpClient.Builder()
        // Sets the default connect timeout for new connections.
        .connectTimeout(5, TimeUnit.SECONDS)
        // Sets the default read timeout for new connections.
        .readTimeout(10, TimeUnit.SECONDS)
        // Sets the default write timeout for new connections.
        .writeTimeout(20, TimeUnit.SECONDS)
        .build()
    

    All connectTimeout, readTimeout and writeTimeout default to 10 seconds 👍.

    XMLHttpRequest and Fetch API timeouts

    XMLHttpRequest is the standard foundation of network communication of Web application for over 10 years now. Nowadays it is being replaced with Fetch API but it still is, and will continue to be, the most popular for couple of years. There is only a single timeout configuration available in XMLHttpRequest:

    The XMLHttpRequest.timeout property is an unsigned long representing the number of milliseconds a request can take before automatically being terminated. The default value is 0, which means there is no timeout.

    Default is infinite ‼️

    Since the default value is not configured we should diligently set the timeout in our code! It may be tempting to think that client side timeout is not so important compared to the one on the server. This is a questionable attitude to say the least. We need to keep in mind that there is a hard limit on the number of connections a browser will make to a single domain which is very important if we use HTTP 1.*. When we reach maximum number of concurrently opened connections, any new XMLHttpRequest is going to be queued indefinitely. The limit value varies in browsers and the recent RCF relaxes it. HTTP/2 alleviates the issue with connection multiplexing nonetheless its adoption is still low. According to w3techs it is about 20% as of today. The timeout value used in XMLHttpRequest is even more important in Single Page Applications. In SPAs the XMLHttpRequest without a timeout can live for as long as server and intermediate network parties allow effectively blocking all subsequent network calls.

    Fetch API is meant to replace XMLHttpRequest. It is thus sad that the ability to timeout a request has not yet made it into the standard. Currently there is no standard way to enforce a timeout. There are couple of GitHub issues active: Add timeout option, Add option to reject the fetch promise automatically after a certain time elapsed that go over potential solutions. There was a proposal for cancelable promises which had been withdrawn after lots of discussion and lack of consensus. A brand new way has recently been implemented by Edge and Firefox allows one to timeout a Fetch API call 🎉 through the DOM standardized AbortController. Hopefully it will get into the Fetch API standard soon.

    const controller = new AbortController();
    const signal = controller.signal;
    
    setTimeout(() => controller.abort(), 5000);
    
    fetch(url, { signal }).then(response => {
      return response.text();
    }).then(text => {
      console.log(text);
    });
    

    URLSession timeouts

    URLSession is the successor to NSURLConnection that underlays most if not all iOS http clients e.g. Alamofire. There are 2 main timeout values to configure both of which have default values available via URLSessionConfiguration.default:

    let sessionConfig = URLSessionConfiguration.default
    sessionConfig.timeoutIntervalForRequest = 20.0
    sessionConfig.timeoutIntervalForResource = 40.0
    
    let session = URLSession(configuration: sessionConfig)
    

    Fortunately there are default values configured:

    • timeoutIntervalForRequest:

      This property determines the request timeout interval for all tasks within sessions based on this configuration. The request timeout interval controls how long (in seconds) a task should wait for additional data to arrive before giving up. The timer associated with this value is reset whenever new data arrives. The default value is 60.

    • timeoutIntervalForResource:

      This property determines the resource timeout interval for all tasks within sessions based on this configuration. The resource timeout interval controls how long (in seconds) to wait for an entire resource to transfer before giving up. The default value is 7 days.

    Note that timeoutIntervalForResource is a higher level timeout that what we have considered in other HTTP clients. It encompasses retries and or request timeouts hence has a large default.

    Summary

    Many of HTTP clients do not have a good default timeout configuration. Hence, if you care about your application resource usage and system stability you have to carefully review and configure timeouts where applicable. It is reassuring to see that modern HTTP clients e.g. OkHttp and URLSession have a short but sane default.

  • Database timeouts

    31 October 2017

    Last time I have outlined the importance of timeouts. Without a carefully considered timeouts our application can become unresponsive easily. In this post I will focus on configuring various timeouts related to interaction with database. I am going to focus specifically on relational databases. The principles and practices however can be applied equally well to other types of databases.

    Database

    Different kinds of timeouts

    Asking a database for results of a query is one of the most common activities a back end application will do. Let us decompose this simple task into steps:

    1. Establish database connection(s) in pool
    2. Take the connection out of the pool
    3. Validate the acquired connection
    4. Send statement(s) to database
    5. Read query results

    Each of the above steps can involve a specific timeout configuration. The details depend on particular technology stack and type of database we are querying.

    Establish database connection(s) in pool

    Dealing with raw database connections is almost always done with the help of connection pool. Establishing a connection to database is very expensive compared to running a simple statement. The pool alleviates this cost by reusing connections for as long as needed.

    The first timeout is the maximum duration until a database connection is established. In JDBC this can be controlled by:

    • connectTimeout in MySQL JDBC driver

      Timeout for socket connect (in milliseconds), with 0 being no timeout. Only works on JDK-1.4 or newer. Defaults to ‘0’.

      The default is infinite ‼️

    • socketTimeout in MySQL JDBC driver

      Timeout (in milliseconds) on network socket operations (0, the default means no timeout).

    • loginTimeout in MySQL JDBC driver

      Sets the maximum time in seconds that a driver will wait while attempting to connect to a database once the driver has been identified.

      The default value 0 means infinite ‼️

    • loginTimeout in PostgreSQL JDBC driver:

      Specify how long to wait for establishment of a database connection. The timeout is specified in seconds.

      Default is infinite ‼️

    • connectTimeout in PostgreSQL JDBC driver:

      The timeout value used for socket connect operations. If connecting to the server takes longer than this value, the connection is broken. The timeout is specified in seconds and a value of zero means that it is disabled.

    • socketTimeout in PostgreSQL JDBC driver:

      The timeout value used for socket read operations. If reading from the server takes longer than this value, the connection is closed. This can be used as both a brute force global query timeout and a method of detecting network problems. The timeout is specified in seconds and a value of zero means that it is disabled.

    You probably have noticed a recurring theme above: default timeouts are either infinite or disabled at the driver level. In case of socketTimeout and connectTimeout there can still be a system level timeout involved both on Linux and Windows. However, those only work on blocking send and receive operations and how the JDBC driver interacts with the socket is an implementation detail for the most part.

    In order to demonstrate how the above timeouts work in practice we will use the following test cases:

    class JdbcTimeoutTest {
        @Test
        fun `mysql getConnection`() {
            val mysqlDataSource = mysqlDataSource()
    
            assertTimeoutPreemptively(Duration.ofMinutes(3)) {
                useResource { mysqlDataSource.connection }
            }
        }
    
        @Test
        fun `postgresql getConnection`() {
            val mysqlDataSource = postgreSQLDataSource()
    
            assertTimeoutPreemptively(Duration.ofMinutes(3)) {
                useResource { mysqlDataSource.connection }
            }
        }
    
        fun useResource(resourceProvider: () -> AutoCloseable) {
            val start = Instant.now()
            try {
                resourceProvider().use {
                    println("Completed in ${Duration.between(start, Instant.now())}")
                }
            } catch (e: Exception) {
                println("Error $e after ${Duration.between(start, Instant.now())}")
            }
        }
    
        fun mysqlDataSource(): MysqlDataSource {
            return MysqlDataSource().apply {
                this.setURL("jdbc:mysql://localhost:3306/database")
                this.user = "user"
                this.setPassword("password")
            }
        }
    
        fun postgreSQLDataSource(): PGSimpleDataSource {
            return PGSimpleDataSource().apply {
                this.user = "user"
                this.password = "password"
                this.databaseName = "database"
                this.serverName = "localhost"
            }
        }
    }
    

    To simulate a misbehaving database server we’ll use netcat listening on standard MySQL and PostgreSQL port e.g.:

    nc -k -l 3306 # listen on MySQL port, PostgreSQL uses 5432 by default
    

    Both of the above tests will fail due to assertTimeoutPreemptively.

    The most appropriate candidate to use for establishing connection is loginTimeout. This works in PostgreSQL but does not with MySQL. Apparently the MySQL JDBC driver in versions 5.1, 6.0 and 8.0 implement the method as Noop. Interestingly it is possible to force MySQL driver to respect the timeout when it is set globally through a static method java.sql.DriverManager.setLoginTimeout.

    A slightly less correct option is to use connectTimeout or socketTimeout. The socket level options work oblivious to database protocol hence it is impossible to set the timeout accurately for the whole establish connection operation. Additionally, the socketTimeout is applied to socket read operations not only to initial connection handshake. The fake netcat server is not suitable for testing the connectTimeout however we can use it for socketTimeout. The PostgreSQL driver correctly reported error after about 2 seconds after setting pgSimpleDataSource.socketTimeout = 2. Unfortunately settings socketTimeout in MySQL driver had no effect on the getConnection behavior. Interestingly no matter what value I’ve set the socketTimeout to, the error was thrown after about 26 seconds. I have no idea why it behaves like that 🤔.

    Be aware of the shortcomings of MySQL JDBC Driver.

    Take the connection out of the pool

    Reusing database connections gives the application great performance boost. However, writing an efficient and bug free database connection pool is no easy task thus we should all rely on proven solutions. In JVM world there are multiple choices when it comes to JDBC:

    • Hikari Claims to be the fastest and has limited number of configuration knobs and sane defaults. My favorite by far.
    • DBCP 2 A recently resurrected project which has a potential of being applicable to all resources pools with its commons-pool2 module.
    • Tomcat JDBC Connection Pool Commonly used with lots of configuration options. Came to be as a replacement of dbcp.

    When there are no connections available in the pool, the code asking for it needs to wait until one is available. The amount of time a thread is blocked waiting for a connection needs to be considered carefully. It is important to note that there are 2 situations we need to consider. The first one is when the pool has reached its maximum size and all connections are being used already. This is when there is very little actual work required to acquire a connection. The second case is when all currently opened connections are in use but the pool is allowed to create a new connection because it is not yet full. Here we need to keep in mind that time to establish a connection to the database may easily be around 200ms hence timeout should not be too short. Below you’ll find how to configure the timeout in the mentioned connection pools:

    • Hikari: connectionTimeout

      This property controls the maximum number of milliseconds that a client (that’s you) will wait for a connection from the pool. If this time is exceeded without a connection becoming available, a SQLException will be thrown. Lowest acceptable connection timeout is 250 ms. Default: 30000 (30 seconds)

    • DBCP: maxWaitMillis

      The maximum number of milliseconds that the pool will wait (when there are no available connections) for a connection to be returned before throwing an exception, or -1 to wait indefinitely.

      Default is infinite ‼️

    • Tomcat: maxWait

      (int) The maximum number of milliseconds that the pool will wait (when there are no available connections) for a connection to be returned before throwing an exception. Default value is 30000 (30 seconds)

    My rule of thumb is to set this timeout be under 5 seconds.

    Validate the acquired connection

    A database connection can be opened for several hours or even days. However, because there is network involved there are numerous cases where a socket that is seemingly open on the client side may in fact be part of a broken connection. A well behaving connection pool should avoid handing such connection to an application code. A common strategy to alleviate the problem is to test the connection just before it is taken out of the pool. In the past the test was performed using a simple SQL query e.g. SELECT 1. Nowadays there is a isValid method available on the JDBC Connection itself which moves the responsibility to the driver itself:

    Returns true if the connection has not been closed and is still valid. The driver shall submit a query on the connection or use some other mechanism that positively verifies the connection is still valid when this method is called.

    • Hikari: validationTimeout:

      This property controls the maximum amount of time that a connection will be tested for aliveness. This value must be less than the connectionTimeout. Lowest acceptable validation timeout is 250 ms. Default: 5000

    • DBCP: validationQueryTimeout:

      The timeout in seconds before connection validation queries fail. If set to a positive value, this value is passed to the driver via the setQueryTimeout method of the Statement used to execute the validation query.

      Default is infinite ‼️

    • Tomcat: validationQueryTimeout:

      (int) The timeout in seconds before a connection validation queries fail. This works by calling java.sql.Statement.setQueryTimeout(seconds) on the statement that executes the validationQuery. The pool itself doesn’t timeout the query, it is still up to the JDBC driver to enforce query timeouts. A value less than or equal to zero will disable this feature. The default value is -1.

      Default is infinite ‼️

    Send statement(s) to database and read query results

    We have finally arrived at the most common usage. Every query that we send to a database should have a timeout configured either at the statement level or at the transaction level. When it comes to individual statements, there is setQueryTimeout available:

    Sets the number of seconds the driver will wait for a Statement object to execute to the given number of seconds. By default there is no limit on the amount of time allowed for a running statement to complete. If the limit is exceeded, an SQLTimeoutException is thrown. A JDBC driver must apply this limit to the execute, executeQuery and executeUpdate methods.

    Additionally, it’s up to the driver to decide what the above timeout means exactly:

    Note: JDBC driver implementations may also apply this limit to ResultSet methods (consult your driver vendor documentation for details).

    Note: In the case of Statement batching, it is implementation defined as to whether the time-out is applied to individual SQL commands added via the addBatch method or to the entire batch of SQL commands invoked by the executeBatch method (consult your driver vendor documentation for details).

    A time required for a query to complete is very use case dependent thus we should not expect a sane default to be there. Instead, we need to ask ourself how long we are willing to wait for a query to complete. It is very easy to forget about this rule hence it is very handy to be able to set this timeout globally:

    • DBCP: defaultQueryTimeout

      defaultQueryTimeout null If non-null, the value of this Integer property determines the query timeout that will be used for Statements created from connections managed by the pool. null means that the driver default will be used.

    • Tomcat: queryTimeout available through QueryTimeoutInterceptor

      (int as String) The number of seconds to set for the query timeout. A value less than or equal to zero will disable this feature. The default value is 1 seconds.

    • Hikari: Not available but fairly easy to add by wrapping a DataSource e.g.:

    class CustomTimeoutsDataSource(val innerDataSource: DataSource, private val queryTimeoutProperties: DataSourceConfiguration.QueryTimeoutProperties) : DataSource by innerDataSource {
    
        override fun getConnection(username: String?, password: String?) = configureTimeouts(innerDataSource.getConnection(username, password))
        override fun getConnection() = configureTimeouts(innerDataSource.connection)
    
        private fun configureTimeouts(connection: Connection):Connection = CustomTimeoutsConnection(connection, queryTimeoutProperties)
    
        private class CustomTimeoutsConnection(val innerConnection: Connection, private val queryTimeoutProperties: DataSourceConfiguration.QueryTimeoutProperties)
            : Connection by innerConnection {
    
            private fun <T : Statement> configure(prepareStatement: T): T {
                //0 means no timeout
                val desiredTimeout = queryTimeoutProperties.statementQueryTimeoutInSeconds ?: 0
                prepareStatement.queryTimeout = desiredTimeout
                LOG.trace("Configure timeout {} seconds for statement {}", desiredTimeout, prepareStatement)
                return prepareStatement
            }
    
            override fun prepareStatement(sql: String?, autoGeneratedKeys: Int) = configure(innerConnection.prepareStatement(sql, autoGeneratedKeys))
            override fun prepareStatement(sql: String?, resultSetType: Int, resultSetConcurrency: Int, resultSetHoldability: Int) = configure(innerConnection.prepareStatement(sql, resultSetType, resultSetConcurrency, resultSetHoldability))
            override fun prepareStatement(sql: String?) = configure(innerConnection.prepareStatement(sql))
            override fun prepareStatement(sql: String?, columnNames: Array<out String>?) = configure(innerConnection.prepareStatement(sql, columnNames))
            override fun prepareStatement(sql: String?, resultSetType: Int, resultSetConcurrency: Int) = configure(innerConnection.prepareStatement(sql, resultSetType, resultSetConcurrency))
            override fun prepareStatement(sql: String?, columnIndexes: IntArray?)= configure(innerConnection.prepareStatement(sql, columnIndexes))
            override fun prepareCall(sql: String?) = configure(innerConnection.prepareCall(sql))
            override fun prepareCall(sql: String?, resultSetType: Int, resultSetConcurrency: Int) = configure(innerConnection.prepareCall(sql, resultSetType, resultSetConcurrency))
            override fun prepareCall(sql: String?, resultSetType: Int, resultSetConcurrency: Int, resultSetHoldability: Int) = configure(innerConnection.prepareCall(sql, resultSetType, resultSetConcurrency, resultSetHoldability))
            override fun createStatement() = configure(innerConnection.createStatement())
            override fun createStatement(resultSetType: Int, resultSetConcurrency: Int) = configure(innerConnection.createStatement(resultSetType, resultSetConcurrency))
            override fun createStatement(resultSetType: Int, resultSetConcurrency: Int, resultSetHoldability: Int) = configure(innerConnection.createStatement(resultSetType, resultSetConcurrency, resultSetHoldability))
    
            override fun toString(): String {
                return "CustomTimeoutsConnection(innerConnection=$innerConnection)"
            }
        }
    
        companion object {
            private val LOG = LoggerFactory.getLogger(CustomTimeoutsDataSource::class.java)
        }
    }
    

    The JDBC level queryTimeout is enforced at the application code side i.e. there’s a code executed after the timeout elapses which stops the query execution. Recent releases of both MySQL and PostgreSQL offer a database server level statement timeout capabilities.

    • MySQL: MAX_EXECUTION_TIME

      The MAX_EXECUTION_TIME hint is permitted only for SELECT statements. It places a limit N (a timeout value in milliseconds) on how long a statement is permitted to execute before the server terminates it:

    • PostgreSQL: statement_timeout

      Abort any statement that takes more than the specified number of milliseconds, starting from the time the command arrives at the server from the client. If log_min_error_statement is set to ERROR or lower, the statement that timed out will also be logged. A value of zero (the default) turns this off. Setting statement_timeout in postgresql.conf is not recommended because it would affect all sessions.

    If you are using a JPA provider like Hibernate, you might be urged to use javax.persistence.query.timeout. However, from my experience using Hibernate this timeout, when configured globally, is enforced in some uses cases and completely ignored in others. There were multiple bugs reported related to this feature: Bug 303360, HHH-9929, “Query timeout in persistence.xml doesn’t work” and some of them are still not addressed.

    There is no transaction scoped timeout available in JDBC. However, one can still apply the timeout in the application code through e.g. @Transactional.timeout.

    Keep all timeouts short

    The best timeout is a short one. It is often tempting to increase a query or wait timeout in face of performance or throughput problems. However, doing so will increase the amount of resources blocked on the server and is thus a rarely a good choice. Blocking more and more resources on server e.g. threads, may at some point, cause the entire server to collapse abruptly. I keep the timeouts as short as possible especially when certain API is called often. If you are looking for a good read on the topic I highly recommend Release It! by Michael T. Nygard. This books covers many resiliency related topics including timeouts and provides strategies to avoid increasing them.

  • The importance of timeouts

    23 October 2017

    Timeouts are not an exciting thing to talk about. They do not add immediately perceivable value. They are difficult to guess get right and force one to consider problems that are hard to solve. In fact, in my experience, the timeout is only ever considered when our software stops working or is about to. That is an enormous shame since, in my opinion, carefully applied timeouts can vastly improve software resiliency.

    Man with a wrist clock

    An example application

    Let us consider a simplistic example of a Spring Boot application generated using Spring Initializr. The application will only expose actuator API which by default define a health check endpoint. Our example will also have a mail module configured.

    The dependencies section of build.gradle:

    dependencies {
    	compile('org.springframework.boot:spring-boot-starter-actuator')
    	compile('org.springframework.boot:spring-boot-starter-mail')
    	compile('org.springframework.boot:spring-boot-starter-web')
    	compile("org.jetbrains.kotlin:kotlin-stdlib-jre8:${kotlinVersion}")
    	compile("org.jetbrains.kotlin:kotlin-reflect:${kotlinVersion}")
    }
    

    A typical health check endpoint verifies that application integration points work correctly. If the service talks to database then a connection is established and verified. A free disk space is checked. If the service sends emails through SMTP then a connection to a server is established.

    The health checks are auto discovered and enabled when you include Spring Boot Actuator. By default, /health path is used for the endpoint. The SMTP server host, port and credentials obviously need to be configured. At minimum host entry is required e.g. spring.mail.host=localhost. For the debugging purpose one can disable actuator security with management.security.enabled=false to verify what health checks are performed by actuator.

    A last part of our example is the trivial application code:

    @SpringBootApplication
    class Application {
        companion object {
            @JvmStatic
            fun main(args: Array<String>) {
                SpringApplication.run(Application::class.java, *args)
            }
        }
    }
    

    When you request /health the API will return response similar to:

    HTTP/1.1 200 
    Content-Type: application/vnd.spring-boot.actuator.v1+json;charset=UTF-8
    Date: Mon, 23 Oct 2017 08:08:32 GMT
    Transfer-Encoding: chunked
    X-Application-Context: application
    
    {
        "diskSpace": {
            "free": 105755779072,
            "status": "UP",
            "threshold": 10485760,
            "total": 499046809600
        },
        "mail": {
            "location": "localhost:-1",
            "status": "UP"
        },
        "status": "UP"
    }
    

    An application health API, like the one in our example, is often hooked into external monitoring software. The monitor asks the target application about its health in regular intervals e.g. every 5 seconds.

    Shooting yourself in the foot

    The above example has an issue that can kill production server. More importantly, other metrics that are usually monitored e.g. CPU and memory usage will not warn about upcoming, dreadful service stall. The application will also not suffer from an enormous number of requests or emails being sent.

    Imagine that the health endpoint is checked every 5 seconds and that there is intermittent issue with the SMPT server. The health endpoint will rightfully try to connect to the SMTP server and from time to time respond with error. From my experience, when a health check is introduced it typically takes a while to tweak the monitor thresholds so that we get rid of false alarms. It is thus very easy to ignore intermittent errors when we think they are caused by too sensitive thresholds. However, the ignored errors can after a while cause our server to stop responding to any request.

    Man with a wrist clock

    Why this can happen you ask and I answer. There is no timeout configured!

    The mail server health check uses javax.mail.Service.connect under the hood. For a variety of reasons an attempt to establish a TCP connection can take arbitrary longer than usual. Unfortunately the default timeouts used by the javax.mail.* are infinite. A thread that waits for the connection to be established cannot serve other requests even though it barely uses any CPU. The default maximum thread pool size used by embedded Tomcat in Spring Boot application is 200. Assuming that the blocked connection attempt happens twice an hour our application will stop working after 4 days.

    Never use infinite timeouts

    As you can see it is very easy to miss a need for a timeout to be configured. To be fair the Spring Boot documentation states clearly:

    In particular, certain default timeout values are infinite and you may want to change that to avoid having a thread blocked by an unresponsive mail server:

    spring.mail.properties.mail.smtp.connectiontimeout=5000
    spring.mail.properties.mail.smtp.timeout=3000
    spring.mail.properties.mail.smtp.writetimeout=5000
    

    In my option any library or framework should either force the programmer to configure the timeout or have some sensible default. Unfortunately this is not always possible to introduce them later on without breaking changes hence we should check what are the timeouts used when calling any external service.

    Timeouts needed everywhere

    Imagine a controller action method that inserts a single row into a database. Let us further assume that the endpoint is called 50 times per second and it typically takes 100ms to complete. Things work well until we encounter an intermittent sloppiness of the database and now the insert takes 2 seconds to complete. The clients calling the API do not slow down. More request threads are blocked and more database connections are taken out of the pool. Soon as all database connections are in use and all other API endpoints start to fail. This is an example of cascading failure i.e. a problem in one component propagating to others. It is easier to avoid such issues when there is a timeout configured on the controller action and the interaction with the database.

    Every API endpoint should have a timeout configured if we want our services to be resilient. Unfortunately this rule is easy to neglect. Moreover, some frameworks do not even expose such ability. Even in the immensely popular Spring Mvc I could not find a way to set such timeout other than using an async method. Fortunately there are libraries e.g. Hystrix that tackle that problem and can be integrated easily.

    Just to recap here is a short and incomplete list of cases where a timeout should be configured:

    • a controller action
    • a database query, statement
    • a database pool interaction
    • a thread pool interaction
    • an API client e.g. HTTP, SOAP, SMTP

    I will describe how to deal with the cases in the following posts.

  • How to clear database in Spring Boot tests?

    13 October 2017

    Nowadays using a production like database in unit1 tests is a common practice. Calling a real database can increase our confidence that a tested code actually works. Having said that a database, by its very nature, brings external state into a test that will affect its behavior, hence we need to pay special attention to prepare the test execution. There are couple of ways to handle the database state in tests and I’m going to describe an approach I like most.

    Database

    Problems with Spring Boot Transactional tests

    Spring Boot offers many helpers to make testing application easier. Among many you can use a @DataJpaTest which by default will configure an in-memory embedded database. You can use a production type database in tests by adding @AutoConfigureTestDatabase(replace=Replace.NONE) like so:

    @RunWith(SpringRunner.class)
    @DataJpaTest
    @AutoConfigureTestDatabase(replace=Replace.NONE)
    public class ExampleRepositoryTests {
    
        // ...
    
    }
    
    

    The @DataJpaTest uses @Transactional under the hood. A test is wrapped inside a transaction that is rolled back at the end. This means that when using e.g. Hibernate one needs to pay special attention to how the tested code is written. As shown in the Java example below, a manual flush is indeed required:

    @RunWith(SpringRunner.class)
    @ContextConfiguration(classes = TestConfig.class)
    @Transactional
    public class HibernateUserRepositoryTests {
        ...
        @Test
        public void createUser() {
            // track initial state in test database:
            final int count = countRowsInTable("user");
    
            User user = new User(...);
            repository.save(user);
    
            // Manual flush is required to avoid false positive in test
            sessionFactory.getCurrentSession().flush();
            assertNumUsers(count + 1);
        }
    }
    

    Using @Transactional annotation on tests is certainly easy but I still don’t use it for the following reasons:

    • The production code is using transactions with different scope.
    • It is easy to forget about a flush and thus have false positive in test.
    • On failure and when debugging it is hard to see what values were actually saved in db.
    • It is much harder to write tests of production code that requires a transaction to be committed.
    • The test code needs to be more tightly coupled to production code and we all know that it hinders refactoring.

    Cleaning database with SQL

    In tests involving a database I reset its state before each test using plain old SQL. This makes the test code less dependent on how a transaction is scoped inside production code. Furthermore, one can easily review the values saved after a test failure. It turns out it is easy to write a JUnit @Rule or BeforeEachCallback that will remove all rows from all tables. Moreover, we can do so without hard coding table names which would increase maintenance cost.

    Let’s start with defining a @Rule in Kotlin in that will be called before each test:

    import org.junit.rules.ExternalResource
    import org.springframework.stereotype.Component
    import javax.sql.DataSource
    
    @Component
    class DatabaseCleanerRule(private val dataSource: DataSource) : ExternalResource() {
    
        override fun before() {
            if (databaseCleaner == null) {
                // Consider inspecting dataSource to check if we are connecting to test database
                databaseCleaner = DatabaseCleaner(dataSource::getConnection)
            }
            databaseCleaner!!.reset()
        }
    
        companion object {
            internal var databaseCleaner: DatabaseCleaner? = null
        }
    }
    

    Consider inspecting dataSource to check if we are about to connect to test database and not one used for development. It is very easy to use incorrect Spring Profile and wipe out your development data. Ask me how I know?

    We can use the DatabaseCleanerRule in a spring enabled test as any other JUnit rule e.g. @Rule @Inject lateinit var cleanerRule: DatabaseCleanerRule.

    Notice that we’ve delegated the actual important work to DatabaseCleaner class defined in Kotlin below.

    import com.practi.util.iterator
    import org.slf4j.LoggerFactory
    import java.sql.Connection
    import java.sql.PreparedStatement
    import java.sql.SQLException
    
    class DatabaseCleaner(private val connectionProvider: () -> Connection) {
        private val tablesToExclude = mutableSetOf<String>()
        private var tablesForClearing: List<TableRef>? = null
    
        fun excludeTables(vararg tableNames: String) {
            tablesToExclude += tableNames.flatMap { listOf(it, it.toLowerCase()) }
        }
    
        fun reset() {
            if (notPrepared) {
                prepare()
            }
            executeReset()
        }
    
        private val notPrepared get() = tablesForClearing == null
    
        private fun prepare() {
            connectionProvider().use { connection ->
                val metaData = connection.metaData
                val tableRefs = metaData.getTables(connection.catalog, null, null, arrayOf("TABLE")).use { tables ->
                    iterator(tables::next) { tables.getString("TABLE_NAME") }
                        .asSequence()
                        .filterNot(tablesToExclude::contains)
                        .map(::TableRef)
                        .toList()
                }
    
                tablesForClearing = tableRefs
    
                LOG.info("Prepared clean db command: {}", tablesForClearing)
            }
        }
    
        private fun executeReset() {
            try {
                connectionProvider().use { connection ->
                    val reset = buildClearStatement(connection)
                    val result = reset.executeBatch()
                    result
                }
            } catch (e: SQLException) {
                val status = engineInnoDbStatus()
                LOG.error("Failed to remove rows because {}. InnoDb status: {}", e, status)
                throw e
            }
        }
    
        private fun engineInnoDbStatus(): String { ... }
    
        private fun buildClearStatement(connection: Connection): PreparedStatement {
            val reset = connection.prepareStatement("")
            reset.addBatch("SET FOREIGN_KEY_CHECKS = 0")
            tablesForClearing?.forEach { ref ->
                reset.addBatch("DELETE FROM `${ref.name}`")
            }
            reset.addBatch("SET FOREIGN_KEY_CHECKS = 1")
            return reset
        }
    
        data class TableRef(val name: String)
    
        companion object {
            private val LOG = LoggerFactory.getLogger(DatabaseCleaner::class.java)!!
        }
    }
    

    Notice that we’ve defined tablesToExclude set that allows us to omit certain tables. This comes in handy when you’re using a database migration tool that stores its state inside some table(s).

    The JDBC metadata allows us to introspect schema regardless of the database vendor. The iterator is a tiny Kotlin function that aids consuming iterator like objects:

    inline fun <T> iterator(crossinline next: () -> Boolean, crossinline value: () -> T): AbstractIterator<out T> = object : AbstractIterator<T>() {
        override fun computeNext() {
            if (next()) {
                setNext(value())
            } else {
                done()
            }
        }
    }
    

    The buildClearStatement constructs a large query that DELETEs all rows from each relevant table. The example above uses MySQL where it is very easy to disable foreign key checks. This is important since foreign keys would prevent rows to be removed unless we paid special attention to the order of removal. A more generic example of how to deal with referential integrity when clearing a database can be found in the Respawn project.

    Last but not least, when a SQLException is thrown we log the exception accompanied with SHOW ENGINE INNODB STATUS. The status information can hint us about failure reason e.g. another test process executing against the same database or a rogue, runaway thread that locks some rows.

    private fun engineInnoDbStatus(): String {
        return connectionProvider().use { connection ->
            connection.createStatement().executeQuery("SHOW ENGINE INNODB STATUS ").use {
                iterator(it::next) { it.getString("Status") }.asSequence().toList()
            }.joinToString(System.lineSeparator())
        }
    }
    

    The above examples show that it is not hard to manually reset the database. I’ve found that using this approach makes my tests more trustworthy and less coupled to the underlying persistence layer. In fact, we can easily switch e.g. from JPA to JdbcTemplate in a performance critical code area without a need to change a test.

    1 Whether it is actually unit or integration test is a different topic.