I tried GPT-5.4, and most answers were really good – but a few had me concerned

I tried GPT-5.4, and most answers were really good – but a few had me concerned

GPT-5.4 Thinking: Deep Analysis, Stubborn Execution

OpenAI’s latest release, GPT-5.4 Thinking, marks a significant leap in AI reasoning capabilities—but with some notable quirks. Unlike incremental updates, this model jumps from 5.2 to 5.4 and introduces a “thinking” mode designed for complex problem-solving.

Key Takeaways

  • Superior text analysis: Delivers thoughtful, in-depth responses to complex queries
  • Reasoning issues: Sometimes answers questions you didn’t ask rather than the ones posed
  • Format limitations: Struggles with proper formatting and image generation quality

My Testing Experience

I evaluated GPT-5.4 Thinking across four distinct challenges using ChatGPT Plus ($20/month). The model consistently produced high-quality text analysis but exhibited frustrating behavior—it would often ignore my specific instructions and pursue its own interpretation instead.

Test 1: Aircraft Carrier Design Challenge

When asked to design a flying aircraft carrier, GPT-5.4 Thinking provided excellent engineering analysis, correctly identifying why downward-facing propellers are impractical. However, it failed at image generation, repeatedly producing the same incorrect visual despite detailed instructions.

Test 2: Boston Travel Itinerary

The AI created comprehensive travel plans for both luxury and budget travelers, including cost breakdowns and weather contingency planning. While the information was solid, formatting remained problematic—delivering massive numbered lists that required manual reformatting.

Test 3: Social Media Impact Analysis

This is where GPT-5.4 Thinking truly shined. It delivered a 1,300-word nuanced analysis of social media’s societal impact, concluding that despite benefits, it has “worsened communication overall.” The depth and balance of this response exceeded earlier models significantly.

Test 4: Educational Constructivism Explanation

Here GPT-5.4 Thinking completely missed the mark. Rather than using “learning by doing” to explain itself (as requested), it delivered a theoretical essay and offered alternative formats—none of which actually followed the constructivist approach I asked for.

The Verdict

GPT-5.4 Thinking performs like a brilliant graduate student who needs constant supervision. The text quality is exceptional, but the model’s tendency to ignore specific instructions is concerning. For professional tasks, this “thinking” model requires diligent oversight—it won’t reliably follow directions without correction.

The image generation failures are particularly troubling given claims about professional-level performance. While GPT-5.4 Thinking can assist professionals, users must remain extremely vigilant about monitoring outputs.

This raises important questions about future AI agents: Will they become more helpful or harder to control? Should we accept AI that insists on its own interpretation over our instructions?

Tags

GPT-5.4, ChatGPT, OpenAI, AI reasoning, thinking models, text analysis, image generation, travel planning, social media analysis, educational theory, AI limitations, professional tasks, AI agents, machine learning, natural language processing

Viral Phrases

“brilliant graduate student who needs constant supervision”, “answers questions you didn’t ask”, “thinking model that doesn’t listen”, “AI overlord”, “professional tasks require diligent oversight”, “learning by doing approach ignored”, “text quality exceptional but instructions ignored”, “image generation failures troubling”, “future AI agents harder to control”, “AI that insists on its own interpretation”

,

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *