Sufficient testing of real-world machine learning (ML) systems is essential because safety violations in these systems can have significant impacts on both individuals and society. Existing methods identify weaknesses in ML models by leveraging attributes associated with data, based on the concept of combinatorial testing. However, sufficient testing, particularly with image data, is challenging due to the rarity of certain attributes, which are difficult to collect in real-world scenarios. In this talk, we present an approach to address these challenges by expanding test datasets using image-generative models. Through experiments that reproduce test data using an image-to-image model, we examine whether the generated data effectively detect weaknesses. We also discuss the practical challenges of generating test data with image-generative models from an application perspective.