Is the training Qualiopi certified?

Yes, Learni is a Qualiopi-certified training organization. This certification guarantees the quality of our courses and enables funding through OPCO and other mechanisms.

How can I fund my training?

Our courses are eligible for OPCO funding. Learni supports you through the funding process. A personalized quote is available upon request.

What are the prerequisites for this training?

The prerequisites for the Training vLLM - Accelerating Inference for Large LLM Models training are: Python basics, knowledge of machine learning and large language models. A preliminary interview validates your eligibility.

Is the training available in person?

Yes, the Training vLLM - Accelerating Inference for Large LLM Models training is available in remote. The format can be adapted to your needs (inter, intra, custom).

Training vLLM - Accelerating Inference for Large... | Learni

The story of Learni

Founded by passionate advocates of learning and innovation, Learni set out to make professional training accessible to everyone, everywhere in the world. Our team works in the largest cities such as Paris, Lyon, Marseille, and internationally, to support talents and organizations in their skills development.

Configure your training

Response within 24hNo commitment100% free

8 spots remaining for the next session

FormatParticipantsDateDetails

Which format do you prefer?

Methods, approaches, and teaching resources

The Training vLLM - Accelerating Inference for Large LLM Models training is delivered in-person or remotely (blended-learning, e-learning, virtual classroom, remote in-person). At Learni, a Qualiopi-certified training organization, each program is designed to maximize skills acquisition, regardless of the training mode chosen.

The trainer alternates between demonstrative, interrogative, and active methods (through practical exercises and/or real-world scenarios). This pedagogical approach ensures concrete and directly applicable learning in the workplace.

Teaching resources provided

To ensure the quality of the Training vLLM - Accelerating Inference for Large LLM Models training, Learni provides the following teaching resources:

Mac or PC computers, high-speed fiber internet, whiteboard or flipchart, video projector or interactive touchscreen (for remote sessions)
Training environments installed on workstations or accessible online
Course materials, practical exercises, and supplementary resources
Post-training access to materials and teaching resources

For in-house training at a location external to Learni, the client ensures and commits to having all necessary teaching materials (IT equipment, internet connection...) for the proper conduct of the training action in accordance with the prerequisites indicated in the communicated training program.

* contact us for remote feasibility** ratio varies depending on the training followed

Assessment methods

The assessment of skills acquired during the Training vLLM - Accelerating Inference for Large LLM Models training is carried out through:

During training: case studies, practical exercises, and professional scenarios
At the end of training: self-assessment questionnaire and skills evaluation by the trainer
After training: training completion certificate detailing acquired skills

Training accessibility

Learni is committed to the accessibility of its professional training programs. All our training programs are accessible to people with disabilities. Our teams are available to adapt teaching methods to your specific needs. Do not hesitate to contact us for any accommodation request.

Training objectives

Master the fundamentals of vLLM for professional, certified deployments

Install and configure a high-performance vLLM server in a corporate environment

Develop skills in inference optimization with PagedAttention

Implement scalable and efficient generative AI pipelines

Integrate vLLM into DevOps environments for production

Design fast LLM applications tailored to business needs

Training program

Module 1vLLM Fundamentals: Installation and First Models (Docker, HuggingFace, Python)

Immersive discovery of vLLM basics through quick installation of a dedicated environment, hands-on with Docker to containerize the server, loading HuggingFace models like Llama or Mistral, first real-time inference tests on complex prompts, practical exercises to compare performance with classic frameworks, creation of a first simple and secure API endpoint, with personalized feedback from the trainer on your configurations.

Module 2vLLM Optimization: PagedAttention and Memory Management (Quantization, Batching)

Deep dive into advanced PagedAttention mechanisms to reduce LLM memory consumption by a factor of 4, configuration of quantization and dynamic batching techniques, latency optimization on concrete business cases like chatbots or code generation, hands-on workshops to benchmark models on GPU/CPU, resolution of common bottlenecks, production of personalized performance reports, directly enhancing your skills in scalable AI production.

Module 3vLLM Deployment: Integration and Monitoring in Production (Kubernetes, Prometheus)

Mastery of vLLM deployment in Kubernetes clusters for high availability, integration with REST APIs and CI/CD pipelines, setup of monitoring via Prometheus and Grafana for real-time traceability, exercises on real-world cases like RAG or fine-tuning in inference, extreme load testing and horizontal scalability, finalization of a deployable continuous project for enterprise use, with delivery of ready-to-use scripts and an action plan for your professional context.