Software testing is an approach to find bugs in programs. Hopefully, your testing technique does not find any bugs in your program. But what does it really say? Is your program absolutely correct? Probably not. Is your testing technique effective? How do we even measure the effectiveness of a testing technique in the absence of bugs? How can we automatically classify inputs as bug-revealing and not-bug-revealing? We are going to talk about the oracle problem, introduce a probabilistic framework to explain properties such as testing effectiveness (finding a maximal number of bugs), efficiency (finding the same bugs as fast as possible), and scalability (maximizing bug finding by scaling across a large number of machines), and map out fundamental limitations of existing testing techniques in terms of these properties.