Making an Enterprise Data Lake and Cloud Migration
Organizations are taking the digital route, which means most of their decisions are supported by data if not automated and directly data-driven. This shift of relying on data-backed insights has resulted in challenges that could not be anticipated earlier. For a multinational health care company primarily focusing on products to treat kidney disease and other chronic and acute medical conditions, this shift has led to many issues regarding storing large amounts of data while it is being easily accessible.


The Challenge
Data accessibility 
Working with multiple departments, teams or touch-points, which require different data repositories, creates data silos. The increase in quantity and diversity of data sets, mushroomed silos, leading to data security and data accessibility issues within the organization. Integrating, maintaining different data sources and avoiding duplication while maintaining data security and providing accessibility to confidential data to selective groups or individuals was a tedious task. Increasing number of data sets led to increased maintenance cost for on prem servers and data integration from various silos was time consuming due to restricted access to different data.
Data Quality 
The existing solutions within the organization couldn’t manage huge volumes of data and the variety of data, including unstructured data. In addition, due to the various sources from different siloed departments for the same data, the quality of data was drastically affected whenever any data reconciliation or data integration took place. The fragmented data created redundancy leading to cost inefficiency. The ambiguity and incomplete data also affects the decision making.
Data swamp, Data management & Access management 
Inefficient data organization, inadequate data cleaning strategies, lack of metadata, or ignorant data governance had led to a data swamp. In addition, access management for groups or individuals was not uniform across the entire enterprise leading to issues with data lineage and data security. This resulted several issues like:
  • Paying for overheads of maintaining the unwanted data
  • Not being able to understand the existing data due to lack of reference
  • Not being able to access the data in timely manner
The Solution
Birlasoft built an Enterprise level Central Data Repository to allow multi-function analytics
By using DAAS (Data as a service) to embed data and analytics into the core processes and decision making. By migrating siloed legacy data warehouse and data processing systems to AWS cloud Data Lake, Birlasoft was able to address the issue of unintegrated & redundant data sets resulting in ease of data accessibility with added data security.
The entire solution revolved around siloed data integration, replacing personal data processing and old ETL systems. The newer data lake was created using AWS S3. Further using the S3 as the upstream source for Snowflake data warehouse on AWS. The data processing was done using AWS EMR, PySpark, and Lambda to draw meaningful insights. Easy data access was possible using AWS Athena on S3 data. The orchestrations were done using AWS Step function Lambda and cloud watch. Finally, access management to the Data Lake was enabled using AWS IAM users and roles and code versioning using Bit bucket.
The Data Lake solution was able to address the issue of data swaps, eliminating redundancy and improving data governance. The siloed data was now in a central data repository. Thus, improving cross departmental partnerships and integration opportunities. Furthermore, the access management to confidential and restricted data was handled seamlessly using AWS IAM.
Technology stack: AWS IAM, EMR, S3 Buckets, Lambda, Step functions, Athena, Data Catalog, Git Hub, Snowflake on AWS.
Liked this transformation story?
Let's build yours now.
The Impact
  • Increased customer satisfaction with 360-degree data visibility.
  • Near real-time reporting backed by low latency ETL using EMR helps in better decision making with comprehensive, actionable insight. Data ingestion as a service resulted in 60% faster data onboarding time (batch processing time saved).
  • 40% lesser platform rollout time leading to competitive advantage with reduced time-to-market (process to migrate fixes, faster release cycle, ability to refresh data near real-time).
  • 15%+ more team bandwidth with standardized deployment patterns for analytics apps & automation (DevOps).
  • Improved data security with global and local compliance adherence.
  • Improved data governance, better data quality, automated data reconciliation (cross applications, cross data sources) with enterprise data hub.
Other Benefits:
  • Accelerated data labeling through collaboration & tools (AI in future) resulting in 40% lesser errors & rework.
  • Standardized and modernized technologies to handle huge and unstructured data.
  • Better access to various data sources as most of the enterprise data is stored in EDH, which also improved the partnerships to drive cross-functional integration opportunities by eliminating the silos.
  • Cost-effective, low maintenance, performance improvement, effective automated error handling, and failover mechanism with snowflake on AWS adoption.
  • Platform and data stability and consistency (standards, change control, glossary)
  • Trust and control in data (policies, verification, certified/watermarked data, use)
  • Right data to the right audience that yields value (use cases, KPIs, utilization, feedback).