• ACCUEIL Bienvenue
  • SÉCURITÉ de l’Information
    • CERTIFICATION & CONFORMITÉ GDPR | ISO 27001 | ISO 27002
    • GESTION du RISQUE de SÉCURITÉ Évaluation du Risque |Test d’Intrusion | Audit Applicatif
    • DÉLÉGATION & ASSISTANCE DPO | Sécurité Opérationnelle | RSSI
  • SOLUTIONS et Technologies
    • CYBERSÉCURITÉ Risk Manager | Pentest
      • ÉVALUATION DU RISQUE Analyse d’Impact
      • AUDIT VULNÉRABILITÉS Surveillance continue
  • A PROPOS Qui sommes-nous
    • URAEUS Consult Qui sommes-nous ?
    • PARTENARIAT Partenaires | Communautés
    • CONTACT Écrivez nous !

BIG DATA

Spark Apache (Framework Big Data)

Présentation

spark_mediumApache Spark  exécute en une seule fois la totalité des opérations d'analyse de données en mémoire et en quasi-temps réel. Spark peut être jusqu'à 100 plus rapide que MapReduce pour l'analyse en mémoire. C'est la solution de fait pour l'analyse en streaming (capteurs, campagnes de marketing en temps-réel, recommandations, analyse de sentiments,  surveillance des logs...)

Apache Spark is a fast, in-memory data processing engine with elegant and expressive development APIs to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets. With Spark running on Apache Hadoop YARN, developers everywhere can now create applications to exploit Spark’s power, derive insights, and enrich their data science workloads within a single, shared dataset in Hadoop. The Hadoop YARN-based architecture provides the foundation that enables Spark and other applications to share a common cluster and dataset while ensuring consistent levels of service and response. Spark is now one of many data access engines that work with YARN in HDP.

Apache Spark consists of Spark Core and a set of libraries. The core is the distributed execution engine and the Java, Scala, and Python APIs offer a platform for distributed ETL application development.

Spark screen shot 3

Additional libraries, built atop the core, allow diverse workloads for streaming, SQL, and machine learning.

Spark is designed for data science and its abstraction makes data science easier.   Data scientists commonly use machine learning – a set of techniques and algorithms that can learn from data. These algorithms are often iterative, and Spark’s ability to cache the dataset in memory greatly speeds up such iterative data processing, making Spark an ideal processing engine for implementing such algorithms.

Spark also includes MLlib, a library that provides a growing set of machine algorithms for common data science techniques: Classification, Regression, Collaborative Filtering, Clustering and Dimensionality Reduction.

Spark’s ML Pipeline API is a high level abstraction to model an entire data science workflow.   The ML pipeline package in Spark models a typical machine learning workflow and provides abstractions like Transformer, Estimator, Pipeline & Parameters.  This is an abstraction layer that makes data scientists more productive

SERVICE & EXPERTISE

  • Audit des Systèmes d’Information
  • Audit Processus Métier
  • Audit Qualité des Données

Sécurité des Données

  • Certification & Conformité
  • Gestion du Risque de Sécurité
  • Délégation & Assistance RSSI
  • Solutions Evaluation du Risque
  • Solutions CyberSécurité

Contact commercial

291 rue Albert Caquot, Valbonne Sophia Antipolis, France 06560

+33 4 8937 9252

E-Mail: contact@uraeus-consult.com

Twitter: uraeus

URÆUS est une marque déposée - Copyright 2025 - Motorisé par Wordpress.