Troubleshooting¶
Find the logs to solve Halvade errors¶
If Halvade doesn’t finish due to an error, the error itself is printed in the output of the Hadoop command. However, more information can be found in the individual task stderr logs of the Executors. The location of these log files is set in the YARN settings. Typically these are stored at ${yarn.log.dir}/userlogs or if the YARN_LOG_DIR environment is set, under $YARN_LOG_DIR/userlogs. To get additional information to find the cause of your error, set the Halvade log level to DEBUG with this option --log DEBUG.
Out of Memory or Timeout¶
Out of Memory errors can occur if the provided (overhead) memory is too low or the data is split in fewer partitions, causing bigger genomic regions and therefore more data per partition. The task will try again and might succeed in the new executor if this is the case, the runtime should not be effected by much.
This is sometimes not very clear and can be shown as a timeout error, but this typically also indicates that an executor went out of memory.
This can be solved by either increasing the memory and memory overhead per executor or increasing the number of partitions with the --partitions settings.
CoarseGrainedScheduler error¶
This error sometimes occurs after halvade has finished and is an error in shutting down the yarn application. This error does not affect the output of Halvade if the total time has been printed 21/07/02 15:53:30 INFO SomaticPipeline$: [Somatic variant calling pipeline] total time (s): 6940.324333113. The error looks like this, found after Halvade is finished:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | ... 20/04/11 01:08:01 INFO spark.ChainedCommandBinaryPipedRDD: Removing RDD 26 from persistence list 20/04/11 01:08:02 INFO spark.ChainedCommandBinaryPipedRDD: Removing RDD 70 from persistence list org.apache.spark.SparkException: Could not find CoarseGrainedScheduler. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:157 at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:137) at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:186) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:527) ... ... java.util.concurrent.RejectedExecutionException: Task scala.concurrent.impl.CallbackRunnable@27276122 rejected from java.util.concurrent.ThreadPoolExecutor@6a3a6d71[Shutting down, pool size = 56, active threads = 0, queued tasks = 0, completed tasks = 346] ... |