Loading...
Please wait a moment
Founded by passionate advocates of learning and innovation, Learni set out to make professional training accessible to everyone, everywhere in the world. Our team works in the largest cities such as Paris, Lyon, Marseille, and internationally, to support talents and organizations in their skills development.
Which format do you prefer?
30 free minutes with a training advisor — no commitment.
Loading available slots...
Explore the evolving role of artificial intelligence in crafting tailored educational journeys, with projections for groundbreaking advancements by April 2026.
Cybersecurity training in Oklahoma City in December 2026 with Learni. Certified, expert trainers, eligible for employer funding. Free quote.
No-Code / Low-Code training in Leeds in November 2026 with Learni. Certified, expert trainers, eligible for employer funding. Free quote.
Cybersecurity training in Sheffield in November 2026 with Learni. Certified, expert trainers, eligible for employer funding. Free quote.
The Training Data Lake - Building Scalable Data Architectures training is delivered in-person or remotely (blended-learning, e-learning, virtual classroom, remote in-person). At Learni, a Qualiopi-certified training organization, each program is designed to maximize skills acquisition, regardless of the training mode chosen.
The trainer alternates between demonstrative, interrogative, and active methods (through practical exercises and/or real-world scenarios). This pedagogical approach ensures concrete and directly applicable learning in the workplace.
To ensure the quality of the Training Data Lake - Building Scalable Data Architectures training, Learni provides the following teaching resources:
For in-house training at a location external to Learni, the client ensures and commits to having all necessary teaching materials (IT equipment, internet connection...) for the proper conduct of the training action in accordance with the prerequisites indicated in the communicated training program.
The assessment of skills acquired during the Training Data Lake - Building Scalable Data Architectures training is carried out through:
Learni is committed to the accessibility of its professional training programs. All our training programs are accessible to people with disabilities. Our teams are available to adapt teaching methods to your specific needs. Do not hesitate to contact us for any accommodation request.
Learni training programs are available for inter-company and intra-company settings, both in-person and remote. Registration is possible up to 48 business hours before the start of training. Our programs are eligible for OPCO, Pôle emploi, and FNE-Formation funding. Contact us to discuss your training project and funding possibilities.
Dive into the key concepts of Data Lakes by evaluating hybrid architectures versus traditional data warehouses, configure a test environment with AWS S3 or Azure Data Lake Storage Gen2, explore schema-on-read to ingest raw data without prior transformation, perform practical exercises on zonal modeling (raw, refined, curated), produce a personal architecture diagram and analyze real enterprise cases to identify common pitfalls, transforming your skills into immediate professional assets.
Build batch and streaming ingestion flows using Apache Kafka for real-time data and NiFi for visual orchestration, integrate Airflow to schedule complex pipelines, test on large datasets from application logs and IoT sensors, manage connectivity errors and resilience with advanced retry patterns, develop a complete pipeline from scratch with integrated monitoring, apply to a concrete enterprise case to accelerate access to raw data and boost your team's analytical productivity.
Optimize storage by converting data into columnar formats like Parquet and ORC for ultra-fast queries, implement Delta Lake for ACID compliance and time travel on your tables, master Hive-style partitioning and Z-ordering to reduce unnecessary scans, migrate a legacy dataset to a refined zone with hands-on exercises, analyze performance via real benchmarks, create a structured data catalog that prepares the ground for scalable enterprise analyses, making your data skills immediately operational.
Process terabytes of data with Apache Spark in SQL and PySpark for distributed transformations, query your Data Lake via Amazon Athena or Presto for ad-hoc analyses without heavy infrastructure, develop cleansing and feature engineering jobs on concrete business cases like fraud detection, optimize performance with caching and broadcast joins, integrate MLflow to track pipelines, produce actionable insights via a red thread dashboard, strengthening your professional skills for data-driven decisions in the enterprise.
Secure your Data Lake with Apache Ranger for fine-grained ACLs and Kerberos for authentication, catalog metadata via Atlas for GDPR-compliant governance, implement monitoring with Prometheus and Grafana to detect anomalies in real time, deploy via CI/CD using GitHub Actions or Jenkins on hybrid cloud, conduct a full audit of your red thread project with an improvement plan, simulate incident scenarios for maximum resilience, conclude with an internal certification that highlights your professional and scalable data management skills.
Target audience
Data engineers, data architects, and BI managers in companies for skill enhancement
Prerequisites
Mastery of SQL, knowledge of Big Data (Hadoop, Spark), and basic Python skills
Loading...
Please wait a moment





























