Loading...
Please wait a moment
Founded by passionate advocates of learning and innovation, Learni set out to make professional training accessible to everyone, everywhere in the world. Our team works in the largest cities such as Paris, Lyon, Marseille, and internationally, to support talents and organizations in their skills development.
Which format do you prefer?
30 free minutes with a training advisor — no commitment.
Loading available slots...
Artificial Intelligence training in San Francisco in October 2026 with Learni. Certified, expert trainers, eligible for employer funding. Free quote.
No-Code / Low-Code training in Leeds in November 2026 with Learni. Certified, expert trainers, eligible for employer funding. Free quote.
Artificial Intelligence training in Glasgow in June 2026 with Learni. Certified, expert trainers, eligible for employer funding. Free quote.
Cybersecurity training in Oklahoma City in December 2026 with Learni. Certified, expert trainers, eligible for employer funding. Free quote.
The Training TensorRT-LLM 2026 - Accelerate LLM Inference x10 in Production training is delivered in-person or remotely (blended-learning, e-learning, virtual classroom, remote in-person). At Learni, a Qualiopi-certified training organization, each program is designed to maximize skills acquisition, regardless of the training mode chosen.
The trainer alternates between demonstrative, interrogative, and active methods (through practical exercises and/or real-world scenarios). This pedagogical approach ensures concrete and directly applicable learning in the workplace.
To ensure the quality of the Training TensorRT-LLM 2026 - Accelerate LLM Inference x10 in Production training, Learni provides the following teaching resources:
For in-house training at a location external to Learni, the client ensures and commits to having all necessary teaching materials (IT equipment, internet connection...) for the proper conduct of the training action in accordance with the prerequisites indicated in the communicated training program.
The assessment of skills acquired during the Training TensorRT-LLM 2026 - Accelerate LLM Inference x10 in Production training is carried out through:
Learni is committed to the accessibility of its professional training programs. All our training programs are accessible to people with disabilities. Our teams are available to adapt teaching methods to your specific needs. Do not hesitate to contact us for any accommodation request.
Learni training programs are available for inter-company and intra-company settings, both in-person and remote. Registration is possible up to 48 business hours before the start of training. Our programs are eligible for OPCO, Pôle emploi, and FNE-Formation funding. Contact us to discuss your training project and funding possibilities.
Discover the complete installation of TensorRT-LLM 2026 on NVIDIA GPU environments, configure Docker and CUDA for fast builds, test first engines on Llama and Mistral models via practical exercises, generate your first performance profiles, integrate Python scripts for automation, validate setups with real-time latency checks, prepare the ground for advanced optimizations in professional training.
Dive into PyTorch to TensorRT-LLM 2026 conversion, apply layer fusion and custom kernels to accelerate inference, build engines on GPT-J and BLOOM with CLI/SLI tools, measure gains via TensorRT benchmarks, optimize long contexts up to 128k tokens, perform exercises on real enterprise datasets, produce production-ready engine deliverables, consolidate professional deep learning skills.
Master FP8 and INT4 quantization in TensorRT-LLM 2026 to reduce memory x4 without precision loss, implement optimized KV cache and paged attention, test on 70B LLMs with dynamic batching, analyze perf/quality trade-offs with NVIDIA Nsight tools, apply to concrete RAG and chatbot cases, generate quantified performance reports, strengthen expertise for certified enterprise deployments.
Deploy TensorRT-LLM 2026 in Kubernetes with NVIDIA Helm charts, scale on multi-GPU A100/H100 via tensor parallelism, integrate Triton Inference Server for REST/gRPC APIs, monitor latency/throughput with Prometheus, test high-load workloads at 1000 req/s, secure endpoints with TLS auth, produce scalable architectures for IT systems, apply in simulated enterprise projects.
Complete a full capstone project: optimize a custom LLM with TensorRT-LLM 2026, tune hyperparameters for x10 perf, debug bottlenecks via TRT traces, integrate DCGM monitoring, present quantified ROI on costs/latency, exchange best practices in small group, receive expert feedback, obtain post-training resources for lasting AI professional skills.
Target audience
Senior ML engineers, data scientists, AI architects, and AI DevOps professionals in companies aiming to upskill in LLM inference optimization
Prerequisites
Advanced proficiency in PyTorch or TensorFlow, CUDA/GPU experience, production deployment of LLMs, Docker containerization basics
Loading...
Please wait a moment





























