Troubleshooting

Find the logs to solve Halvade errors

If Halvade doesn’t finish due to an error, the error itself is printed in the output of the Hadoop command. However, more information can be found in the individual task stderr logs of the Executors. The location of these log files is set in the YARN settings. Typically these are stored at ${yarn.log.dir}/userlogs or if the YARN_LOG_DIR environment is set, under $YARN_LOG_DIR/userlogs. To get additional information to find the cause of your error, set the Halvade log level to DEBUG with this option --log DEBUG.

Out of Memory or Timeout

Out of Memory errors can occur if the provided (overhead) memory is too low or the data is split in fewer partitions, causing bigger genomic regions and therefore more data per partition. The task will try again and might succeed in the new executor if this is the case, the runtime should not be effected by much. This is sometimes not very clear and can be shown as a timeout error, but this typically also indicates that an executor went out of memory. This can be solved by either increasing the memory and memory overhead per executor or increasing the number of partitions with the --partitions settings.

CoarseGrainedScheduler error

This error sometimes occurs after halvade has finished and is an error in shutting down the yarn application. This error does not affect the output of Halvade if the total time has been printed 21/07/02 15:53:30 INFO SomaticPipeline$: [Somatic variant calling pipeline] total time (s): 6940.324333113. The error looks like this, found after Halvade is finished:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
...
20/04/11 01:08:01 INFO spark.ChainedCommandBinaryPipedRDD: Removing RDD 26 from persistence list
20/04/11 01:08:02 INFO spark.ChainedCommandBinaryPipedRDD: Removing RDD 70 from persistence list
org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
                        at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:157
                        at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:137)
                        at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:186)
                        at org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:527)
                                        ...


...
java.util.concurrent.RejectedExecutionException: Task scala.concurrent.impl.CallbackRunnable@27276122 rejected from java.util.concurrent.ThreadPoolExecutor@6a3a6d71[Shutting down, pool size = 56, active threads = 0, queued tasks = 0, completed tasks = 346]
...