Code readability is essential for software maintenance, yet difficult to assess at scale. Traditional metrics fail to capture subjective human judgments, while manual evaluation is expensive and inconsistent. We introduce CoReEval, a benchmark comprising 1.4 million evaluations across 10 LLMs and 656 code snippets (Java, Python, CUDA). We investigate how prompting strategies, decoding parameters, and persona framing influence LLM alignment with human judgments. LLMs exhibit moderate alignment in Java (r=0.25) but near-zero in Python (r=0.02) and CUDA (r=0.00). Developer-guided prompting dramatically improves aspect coverage (Java test Structure: 15% to 94%) but increases score variance. When configured with human-centric criteria, LLMs are promising tools for interpretable code readability assessment in automated code review, developer education, and CI/CD.