Hyper-blocks can significantly improve instruction level parallelism on a wide range of super-scalar and VLIW processors. However, most hyper-block construction approaches aim at minimizing the average-case execution time of a program. In real-time embedded systems, minimizing the worst-case execution time (WCET) of a program is the primary goal of an optimizing compiler. We investigate the hyper-block construction problem for a program executed on a clustered VLIW processor such that the WCET of the program is minimized, and propose a novel heuristic approach considering tail duplications. Our approach is underpinned by a novel priority scheme and a precise tail duplication cost model for computing the WCET of a program. We have implemented our approach in Trimaran 4.0, and compared it with the state-of-the-art approach by using a set of 8 benchmark suites. The experimental results show that our approach achieves the maximum WCET improvement of 20.37% and the average WCET improvement of 11.59%, respectively.