It’s not a secret that most of the blog post ideas are born while dealing with problems in production and this article isn’t an exception. Today, I would like to show you another tool that every non-junior developer should know. It’s a super useful tool from JVM world, still, a mystery for some of us so let’s change that.
Super simple definition worth thousands of words: “[jstack] prints Java stack traces of Java threads for a given Java process or core file or a remote debug server. For each Java frame, the full class name, method name, ‘bci’ (byte code index) and line number, if available, are printed.”
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.121-b13 mixed mode): "Attach Listener" #85 daemon prio=9 os_prio=0 tid=0x00007fe6ac054000 nid=0x755b waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "Thread-12" #83 daemon prio=5 os_prio=0 tid=0x00007fe6b40f3800 nid=0x7510 waiting on condition [0x00007fe6a66d5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000006cd412420> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - None "http-bio-6281-exec-16" #82 daemon prio=5 os_prio=0 tid=0x00007fe6b00f0000 nid=0x7416 waiting on condition [0x00007fe6a7cfd000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000006cec048f0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467) at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:86) at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:32) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - None "YJPAgent-Telemetry" #81 daemon prio=5 os_prio=0 tid=0x00007fe6a000b800 nid=0x7415 in Object.wait() [0x00007fe67f5f4000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.$$YJP$$wait(Native Method) at java.lang.Object.wait(Object.java) at com.yourkit.runtime.TelemetryThread.run(TelemetryThread.java:627) - locked <0x00000006cec04ad0> (a java.lang.Object) Locked ownable synchronizers: - None "main" #1 prio=5 os_prio=0 tid=0x00000000009f0000 nid=0x73b4 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "VM Thread" os_prio=0 tid=0x0000000001199000 nid=0x73b8 runnable "Gang worker#0 (Parallel GC Threads)" os_prio=0 tid=0x0000000000a3e800 nid=0x73b5 runnable "Gang worker#1 (Parallel GC Threads)" os_prio=0 tid=0x00000000009fc800 nid=0x73b6 runnable "Concurrent Mark-Sweep GC Thread" os_prio=0 tid=0x0000000000eb1800 nid=0x73b7 runnable "VM Periodic Task Thread" os_prio=0 tid=0x00000000014e5000 nid=0x73c6 waiting on condition JNI global references: 23956
This is a simplified jstack output – a typical JVM process might hold hundreds of threads.
Output similar to above is especially useful when we deal with all kinds of deadlocks, freezes or slowdowns. It helped me with chasing issues like stale connections (with no specified timeout) to remote service (like RabbitMQ) or deadlocks in production code. While
jstack binary is usually invoked directly on live JVM process when things are on fire, sometimes it is executed too late leaving us without evidence of all kinds of issues.
I could finish this so informative blog post here, however, I think “jstack” alone might be not enough for large scale applications.
Let’s be proactive
The company I used to work with had a cool internal library. Each minute, it logged all active threads and its stacks to a file. I must admit that it saved us many times when something was happening on the production, the first things we would check were:
- heap dumps
Since all of these things could be checked by multiple engineers in parallel, jstack was one of the fastest ways to get a better understanding of why the particular JVM process was hanging.
Our internal implementation was calling
Thread.getAllStackTraces()*, iterating over the result and logging each thread stack to files on disk. Super useful, super simple.
*be aware that call to “Thread.getAllStackTraces()” must iterate on all threads hence causing a brief lock to collect current stack trace. You probably should not call this so often.
When on fire, even when the JVM process was already restarted, what we would do was to quickly ssh into the server, review last 2-3 files with stack traces and, if we were lucky, see what’s going on. During my career I observer two main types of issues where such special log files were useful:
- exploded requests – caused by a single request that results in multiple, small actions that must be executed within a transaction. A good example might be an item(eg. directory) move that contains multiple subitems to be moved/copied as well. I saw lots of actions on the backend that was capable of logging millions of events to Redis or putting millions of values to the Memcache – usually, lots of round trips to remote services were the main issue.
- connection issues – usually, our backends connect to all kinds of databases, caches and so on over the network. When our response time goes up there is a chance that service we depend on has hard time and is causing troubles we are blamed for.
All these issues can be found by just checking and comparing a few more consecutive dumps of stack traces.
Let’s be modern
While the above solution worked well for us, I think it could be improved.
First of all, I haven’t seen a library that does what I mentioned, so we need to craft it ourselves.
Secondly, in the modern era of containers/cloud/k8s, we often do not have direct
ssh access to production servers or containers. Not to mention the dynamic nature of containers and their life cycle – sometimes container might not even exist anymore when we need to check something. That’s more than understandable but it would be great if we still could benefit from knowing what’s happening with our application.
The idea is to automate the process of collecting jstacks and to store them in one centralized place – a simple service backed by the DB + cozy UI would do the job. It would allow the engineers to enter the UI, select a particular JVM/host and see (or even compare!) jstack over time.
Now I am being more geeky here, but, we could harness processing tools to analyze our beloved jstacks – there are lots of things we could fetch out of that. Examples:
- hot code paths – if we observe the same stack trace, again and again, the chances are it is a hot code path. A possible candidate for further profiling/perf testing.
- top customers by requests/utilization (see below)
In projects I worked for, I saw that each JVM thread name was enriched with some metadata, for example, customer name or user name. Some jobs could further analyze these jstacks and allow to split data not only by JVM instance but also by custom tags (eg. customer name – useful if we develop SaasS application that hosts many tenants using a single set of hardware)
The greatest challenge is to fetch jstack while avoiding locks during the collecting phase. One minute break, as my company practice, is definitely too long, if we want to analyze it in detail.
This isn’t an substitution for “real” profilling but still can provide some value. Sounds good as a project for a hackathon/side SAAS business/feature request, doesn’t it?