Amazon's Strategic Overhaul to Address GPU Shortage
Last year, Amazon faced a significant challenge within its expansive retail operations: the inability to secure enough artificial intelligence (AI) chips, specifically graphics processing units (GPUs), which are crucial for various technological projects. This shortage not only delayed numerous initiatives but also posed risks to the timely launch of key projects across Amazon's vast e-commerce and logistics framework. The details surrounding this situation have emerged from a comprehensive set of internal documents obtained by Business Insider.
As the generative AI boom gained momentum early last year, thousands of companies competed to access the necessary infrastructure to leverage this transformative technology. However, Amazon employees reported experiencing prolonged periods without access to GPUs, resulting in significant delays that hampered project timelines across the company's retail division. This division encompasses not only the e-commerce platform but also Amazon's extensive logistics operations.
In response to these challenges, Amazon initiated a bold revamp of its internal processes and technology. The cornerstone of this initiative was the launch of Project Greenland in July, which aimed to create a centralized GPU capacity pool. This strategic move was designed to enhance the management and allocation of the limited GPU supply. According to the documents, the company also instituted stricter protocols for internal GPU utilization.
A guideline highlighted within the documents emphasized the value of GPUs, stating, GPUs are too valuable to be given out on a first-come, first-served basis. Instead, it proposed that distribution should be based on return on investment (ROI) and thoughtful considerations that would contribute to the long-term growth of the company's free cash flow.
Despite being two years into a global GPU shortage, characterized by high demand even among major AI firms, there are signs of optimism. Notably, OpenAI CEO Sam Altman expressed that his organization had run out of GPUs following a model launch. Nvidia, the leading GPU supplier, has also indicated that it would face supply constraints this year. However, internal forecasts at Amazon suggested that the situation would improve, with an expected easing of the GPU crunch in the coming months.
In an email correspondence with Business Insider, an Amazon spokesperson conveyed that the retail sector, which sources GPUs through Amazon Web Services (AWS), now has comprehensive access to the AI processors. Amazon has ample GPU capacity to continue innovating for our retail business and other customers across the company, the spokesperson assured. The company recognized early on that innovations in generative AI were driving rapid adoption of cloud computing services, prompting swift evaluations of growing GPU needs and subsequent actions to meet these demands.
To optimize GPU distribution further, Amazon has now instituted rigorous data-driven requirements for every internal GPU request. Each initiative is meticulously prioritized and ranked based on various criteria, including the completeness of data presented and potential financial benefits per GPU. Projects seeking allocation of these resources must demonstrate that they are shovel-ready for development and show they are in a competitive position for market entry. Additionally, they must provide a timeline for when the anticipated benefits will be realized.
Looking ahead, an internal document from late 2024 indicated that Amazons retail unit planned to distribute GPUs to the highest-priority initiatives as supply became available in the first quarter of 2025. The overarching goal for Amazons retail business remains to ensure that cloud infrastructure expenses yield the highest return on investment, either through revenue growth or by lowering service costs.
Amazon has formalized its GPU allocation strategy into a set of tenets, internal guidelines intended to streamline decision-making. These tenets underscore the importance of return on investment, selective approvals, and an emphasis on speed and efficiency. If a project fails to deliver, it risks having its allocated GPUs reassigned to higher-value initiatives.
Among the eight tenets outlined in the documents, the company emphasizes that:
- ROI and high-judgment thinking are critical for GPU prioritization.
- Distribution should be informed by a combination of ROI and practical considerations.
- Decisions should avoid isolation, centralizing GPU-related initiatives.
- Time is of the essence, and efficient tooling is necessary for swift distribution decisions.
- Innovation is fostered through optimal resource utilization and collaboration.
- A level of acceptable risk is embraced to encourage research and development.
- Transparency around the allocation methodology is encouraged, while sensitive R&D information is kept confidential.
- Allocated GPUs may be recalled if projects yield lower value than anticipated.
To facilitate better management of GPU supply and demand, Amazon introduced Project Greenland. This initiative is described as a centralized GPU orchestration platform that allows for tracking of GPU usage across various teams, optimizing capacity utilization, and enabling the reallocation of resources to more pressing projects. The implementation of this system is expected to enhance operational efficiency by reducing idle capacity while ensuring that the needs of all departments are met.
In line with the urgency surrounding AI advancements, Amazon is actively engaging its GPU resources in a multitude of AI projects. One internal document listed over 160 AI-driven initiatives, including the Rufus shopping assistant and Theia product-image generator. Additional projects include:
- A vision-assisted package retrieval service to help drivers quickly locate packages from delivery vans.
- An automatic data aggregation service to ensure consistent product information.
- An AI model designed to optimize delivery routes and package handling for enhanced efficiency.
- An upgraded customer service AI to address return inquiries using natural language processing.
- A system for automating seller fraud investigations and verifying compliance with documentation.
Last year, Amazon's retail division estimated that its AI investments contributed approximately $2.5 billion in operating profits, along with $670 million in variable cost savings. While projections for these metrics in 2025 remain uncertain, the company is poised to continue its significant investments in AI technology. As of early this year, Amazon's retail sector anticipated spending around $1 billion on GPU-powered projects, with overall expenditures on AWS cloud infrastructure projected to rise from $4.5 billion in 2024 to $5.7 billion in 2025.
During the previous year, Amazon's ambitious slate of AI projects exerted considerable strain on its GPU supply. Reports from December indicated a notable shortage of over 1,000 P5 instances, AWS's cloud server type equipped with up to eight Nvidia H100 GPUs. Initially, this shortage was expected to slightly improve by early this year and transition to a surplus later in 2025. However, an Amazon spokesperson recently indicated that those estimates were now outdated, asserting that there is no longer a GPU shortage.
Furthermore, AWSs in-house AI chip, Trainium, is anticipated to meet the retail divisions demands by the end of 2025. This aligns with CEO Andy Jassy's statements from earlier this year, which suggested that GPU and server constraints would ease by the latter half of the year. Nevertheless, the company continues to express concerns about GPU supply, as evidenced by a recent job listing from the Greenland team, which noted the unprecedented growth in GPU demand as a defining challenge of the current era. The listing posed a critical question: How do we get more GPU capacity?