Today a short story about Mule bug that is neither easy to find nor to fix.
I detected the error in one of the apps that have inbound JMS endpoint. We use ActiveMQ as our message broker but I believe this can happen on other queueing systems. The error is depends on message content or more precisely header.
You cannot troubleshoot it on receiving end, sender application can only be corrected. Error depends on application context or data that is processed and can affect any application any time. It is rare but when happens it completely blocks processing in queue where it happened.
Error log looks like on below snippet. Log4j2 setup was enhanced to trace errors from org.mule.session.LegacySessionHandler but “flowConstructt_&Lorg/mule/api/construct/FlowConstruct” is malformed and cannot be read” and “org.mule.session.DefaultMuleSession” can be found even with standard logging levels.
The root cause is because of the way MULE_SESSION is processed in sending and receiving app.
In sending app MuleSession object is serialized to Java binaries (!?) and encoded base54.
In receiving app it is decoded and some logic is applied to parse it. It tries to parse as it was a text property (splitting by colons, like property key-value pairs). It sometimes fails on binaries. Not very often, we have hundred of apps and it happened for the first time in a very repeatable fashion. Failure is crucial, it can block queue processing completely.
Mulesoft support came up with a KB that I tweaked a little. You need to add 2 elements to your outbound endpoint.
The fix can be done on the sending end only and going forward we need to retrofit this change to all applications that push data to ActiveMQ. Until that is done we are not safe. There always can be something in the data that will break parsing and application will stuck on reading data from a queue….
In fact this should be fixed by Mulesoft, the fix is only a workaround. Until solved, I will always remove the MULE_SESSION property. After all I really do not need it.