This is Ahmed's Oracle Fusion Middleware Blog: com.oracle.bpel.client.delivery.ReceiveTimeOutException in 50% of instances in a BPEL Process Manager cluster

An issue with a recent customer starting occurring in some BPEL processes when we migrated off of a single-node Oracle BPEL Process Manager 10g environment to a 2-node cluster.

The Problem

1. BPEL process SubscriberActivation is instantiated (instance 4961221)

2. It synchronously calls GetDeviceState (instance 4961222)

3. GetDeviceState completes successfully within 6.106 seconds

4. The response is never received by the first process around 50% of the time and returns the following BPEL fault:

<summary>
when invoking locally the endpoint 'http://oradev1.thisisahmed.com:7777/orabpel/default/GetDeviceState/1.0', ; nested exception is
com.oracle.bpel.client.delivery.ReceiveTimeOutException: Waiting for response has timed out
</summary>

With debugging enabled, the domain.log shows the following:

<2010-06-25 07:55:49,226> <DEBUG> <default.collaxa.cube.engine.delivery> <DeliveryHandler::initialRequestAnyType>
com.oracle.bpel.client.delivery.ReceiveTimeOutException: Waiting for response has timed out. The conversation id is bpel://localhost/default/SubscriberActivation~1.0/5040805-BpInv1-BpSeq4.7-2. Please check the process instance for detail.
at com.collaxa.cube.engine.delivery.DeliveryHandler.initialRequestAnyType(DeliveryHandler.java:543)
at com.collaxa.cube.engine.delivery.DeliveryHandler.initialRequest(DeliveryHandler.java:457)

5. The parent process times out in 1000 seconds which is the value of our syncMaxWaitTime (as shown in the Tree Finder figure above).

This only happens when both nodes of the cluster are up and running. If only a single node is running, this issue does not occur.

Troubleshooting Efforts

The BPEL Process Manager Developer's Guide 10g (10.1.3.1.0) asks you to try the following, which is not applicable to our situation.

(a) Increasing transaction-timeout="7200" in $ORACLE_HOME/j2ee/oc4j_soa/config/transaction-manager.xml

(b) Increasing transaction-timeout="3600" to a lower value for CubeEngineBean, DispatcherBean, CubeDeliveryBean, DeliveryBean, DomainManagerBean, and ProcessManagerBean in
$ORACLE_HOME/j2ee/oc4j_soa/application-deployments/orabpel/ejb_ob_engine/orion-ejb-jar.xml

(c) Increasing syncMaxWaitTime to 1000 in $ORACLE_HOME/bpel/domains/default/config/domain.xml

Using TCP instead of UDP for the BPEL PM cluster (in $ORACLE_HOME/bpel/system/config/jgroups-protocol.xml) has no bearing either.

Adding the transaction participate property to the partnerlink won't help either: <property name="transaction">participate</property>

Cause of Problem & Analysis

This problem is caused by the flawed design of the flow in that it doesn't support operating in a BPEL Process Manager cluster.

Both BPEL1 and BPEL2 are designed as synchronous processes.

Success scenario:

Client makes sync request to BPEL1.
BPEL1 makes sync request to BPEL2.
BPEL2 makes sync request to an external service (and BPEL 2 receives the sync response back).
BPEL2 has an async “receive” activity.
BPEL2 responds to BPEL1 synchronously
BPEL1 responds to client synchronously.

Timeout scenario:

Client makes sync request to BPEL1.
BPEL1 makes sync request to BPEL2.
BPEL2 makes sync request to external service (and BPEL 2 receives the sync response back).
BPEL2 has an async “receive” activity, but receives response on Node 2.
BPEL2 tries to reply to BPEL1, but no link back to BPEL1 (BPEL2 completes successfully though).
BPEL1 times out.

Even though BPEL2 is designed as a synchronous process, the onWait forces it to become asynchronous, thus dehydrating, and rehydrating when it receives the callback. The problem is that the callback could be received on the other node, which is the exact behavior we are seeing here 50% of the time.

In fact, this 2 year old blog post of mine discusses the same issue:
http://blog.thisisahmed.com/2008/10/behavior-of-bpel-processes-in-bpel.html

July 8, 2010

com.oracle.bpel.client.delivery.ReceiveTimeOutException in 50% of instances in a BPEL Process Manager cluster

9 comments:

About Me

Contact Me

Previous Blog Posts

Labels