source: simon willison: gemini 3.5 flash: more expensive, but google plan to use it for everything
level: technical
google launched gemini 3.5 flash at google i/o, skipping a preview phase and going straight to general availability. the model is now live for billions of users through the gemini app, ai mode in google search, and developer platforms like google antigravity and ai studio. enterprises can access it via the gemini enterprise agent platform. the model id is gemini-3.5-flash, with a knowledge cutoff of january 2025, supporting over one million input tokens and up to 65,536 output tokens.
the new flash model comes with a significant price hike. at $1.50 per million input tokens and $9 per million output tokens, it costs three times more than gemini 3 flash preview and six times more than gemini 3.1 flash-lite. this pricing approaches that of gemini 3.1 pro, which is $2 and $12. google also teased gemini 3.5 pro for next month, likely at an even higher price. this follows a broader trend of rising api costs across major ai labs, with openai and anthropic also increasing prices for their latest models.
benchmarking data from artificial analysis shows the real cost impact. running their proprietary benchmark on gemini 3.5 flash in high mode cost $1,551.60, compared to $892.28 for gemini 3.1 pro preview and just $278.26 for gemini 3 flash preview. for context, claude opus 4.7 in adaptive reasoning mode cost $5,117.14, and gpt-5.5 in xhigh mode cost $3,357.00. a simple test generating an svg of a pelican on a bicycle used 14,403 output tokens and cost about 13 cents, illustrating how token-heavy tasks can add up quickly.
why it matters: rising api costs for flagship models affect budgeting for ai projects, making it important to track pricing trends and benchmark real usage costs.
source: simon willison: gemini 3.5 flash: more expensive, but google plan to use it for everything