Staged rollout is a software deployment strategy that releases updates incrementally to a fraction of the user base to accelerate software testing and minimize adverse outcomes. This paper automates the decision-making process for staged rollouts during software development while balancing the time spent delivering new features and the downtime caused by potential failures using Q-learning, comparing two exploration strategies, $\epsilon$-greedy and upper confidence bound, and a naive baseline approach. The results indicate that both Q-learning approaches offer greater flexibility in dynamically balancing delivery time and downtime compared to the baseline approach, suggesting that automating staged rollouts with Q-learning is feasible and provides more options for meeting stakeholder requirements while deploying reliable software.