This works explores the feasibility and challenges of using a Large Language Model (LLM) to automatically assess the quality of software architecture diagrams. Our approach is based on a structured prompt that guides the LLM to evaluate architecture diagrams and their accompanying descriptions according to five core quality criteria: clarity, consistency, completeness, accuracy, and level of detail. Preliminary experimental results using OpenAI’s ChatGPT-4o in four open-source projects suggest that LLMs can provide valuable feedback and detect diagrammatic inconsistencies, often in alignment with human expert evaluations. However, the LLM also struggled with context-specific design choices, sometimes misjudging deliberate omissions or the appropriate level of detail, indicating that human oversight remains indispensable. To guide researchers and practitioners, we further recommend practical guidelines for data preparation, prompt construction, and result interpretation, aiming to maximize the reliability and utility of LLM-based architectural evaluations.