Loading...
Please wait a moment
Founded by passionate advocates of learning and innovation, Learni set out to make professional training accessible to everyone, everywhere in the world. Our team works in the largest cities such as Paris, Lyon, Marseille, and internationally, to support talents and organizations in their skills development.
Which format do you prefer?
30 free minutes with a training advisor — no commitment.
Loading available slots...
Artificial Intelligence training in San Francisco in October 2026 with Learni. Certified, expert trainers, eligible for employer funding. Free quote.
Cybersecurity training in Oklahoma City in December 2026 with Learni. Certified, expert trainers, eligible for employer funding. Free quote.
Explore the evolving role of artificial intelligence in crafting tailored educational journeys, with projections for groundbreaking advancements by April 2026.
Professional Training training in New York in September 2026 with Learni. Certified, expert trainers, eligible for employer funding. Free quote.
The Training TensorRT-LLM 2026 - Optimizing LLM Inferences in Production training is delivered in-person or remotely (blended-learning, e-learning, virtual classroom, remote in-person). At Learni, a Qualiopi-certified training organization, each program is designed to maximize skills acquisition, regardless of the training mode chosen.
The trainer alternates between demonstrative, interrogative, and active methods (through practical exercises and/or real-world scenarios). This pedagogical approach ensures concrete and directly applicable learning in the workplace.
To ensure the quality of the Training TensorRT-LLM 2026 - Optimizing LLM Inferences in Production training, Learni provides the following teaching resources:
For in-house training at a location external to Learni, the client ensures and commits to having all necessary teaching materials (IT equipment, internet connection...) for the proper conduct of the training action in accordance with the prerequisites indicated in the communicated training program.
The assessment of skills acquired during the Training TensorRT-LLM 2026 - Optimizing LLM Inferences in Production training is carried out through:
Learni is committed to the accessibility of its professional training programs. All our training programs are accessible to people with disabilities. Our teams are available to adapt teaching methods to your specific needs. Do not hesitate to contact us for any accommodation request.
Learni training programs are available for inter-company and intra-company settings, both in-person and remote. Registration is possible up to 48 business hours before the start of training. Our programs are eligible for OPCO, Pôle emploi, and FNE-Formation funding. Contact us to discuss your training project and funding possibilities.
Guided discovery of the TensorRT-LLM 2026 environment through quick installation on NVIDIA GPU, creation of optimized Docker containers, hands-on with build tools to convert a PyTorch LLM into a TensorRT engine, practical exercises on Llama or Mistral with basic quantization, generation of first accelerated tokens, performance comparison before/after, and production of an initial evaluation report to validate your professional skills.
Deep dive into TensorRT-LLM optimization techniques such as enabling FlashAttention-2 for reduced latency, configuring paged attention KV cache for large contexts, experimentation with multi-GPU using tensor parallelism, exercises on tuning batch sizes and maximum throughput, real enterprise cases with open-source models, precise measurement using NVIDIA Nsight, and development of a personalized optimization plan to boost your inferences up to 5x faster.
Mastery of TensorRT-LLM deployment via Triton Inference Server for scalable APIs, integration with FastAPI or gRPC for enterprise services, securing endpoints with authentication, real-time GPU metrics monitoring via DCGM and Prometheus, exercises on horizontal scaling, resolution of real production bottlenecks, load testing with Locust, and finalization of a deployable ongoing project with complete documentation for immediate enterprise implementation.
Target audience
ML Engineers, data scientists, AI developers for building skills in LLM optimization in the enterprise
Prerequisites
Python basics, PyTorch or TensorFlow, basic knowledge of LLMs and CUDA
Loading...
Please wait a moment





























