
Leveraging AWS SageMaker Data Wrangler for Streamlining Credit Risk Assessment in Loan Applications
Leveraging AWS SageMaker Data Wrangler for Streamlining Credit Risk Assessment in Loan Applications
Financial Services
Logistics
Published:
Published:
Dec 13, 2024
Dec 13, 2024
Authors:
Authors:
Sanchan Moses
Sanchan Moses
In the loan approval process, credit risk assessment plays a crucial role in determining whether an applicant is likely to repay the loan. However, preparing data for risk modeling can be a complex and time-consuming task. In this article, we’ll explore how we leveraged AWS SageMaker Data Wrangler to simplify and accelerate the data processing pipeline for loan applications. By integrating Data Wrangler with AWS SageMaker, MongoDB Atlas, and other AWS services, we automated the data preparation and risk classification for each loan application, making it easier for loan approvers to assess applicants based on their credit risk scores.
Loan approvers can now quickly review applications, compare them against similar cases, and make informed decisions—ultimately streamlining the approval process and enhancing overall efficiency.
Architecture Overview
The loan approval system we built integrates several AWS services and MongoDB Atlas to automate the end-to-end process, from loan application submission to approval. Here's a high-level overview of the architecture:

Mobile App: Loan applicants submit their data through a mobile app that syncs with MongoDB Atlas using Realm & Atlas Device Sync.
AWS SageMaker: A machine learning model classifies loan applicants into high-risk or low-risk categories based on their application data.
Web Admin Portal: Loan approvers review the applications, credit risk scores, and user profiles through a web portal, making real-time decisions.
Real-Time Sync: Approved loans are synced back to the mobile app, providing applicants with immediate feedback.
This architecture integrates real-time data synchronization, intelligent risk scoring, and scalable AWS infrastructure to create an efficient and automated loan approval process.
Key Components and Their Roles
1. Mobile App with Realm & Atlas Device Sync
Loan applicants use a mobile app to submit their loan applications, which include personal details like income, credit history, and requested loan amount. The app interacts with MongoDB Atlas via Realm & Atlas Device Sync to sync this data between the user’s device and the cloud. Realm ensures that data is stored locally on the device, with real-time synchronization occurring automatically when the device is online. This setup ensures data consistency and offline access, making the application more resilient.
2. AWS SageMaker for Credit Risk Classification
The core of the loan approval system is an AWS SageMaker endpoint that processes loan application data to classify applicants into high-risk or low-risk categories. This classification is done using a custom-trained machine learning model that takes the application data and evaluates the risk level based on historical patterns and predefined criteria.
The training of the model is done on historical loan data, stored in S3 buckets.
The real-time inference happens via a SageMaker endpoint, where application data is sent for classification.
The machine learning model helps loan approvers assess risk more objectively and make faster decisions based on the credit score predictions.
3. AWS Lambda for Data Processing
AWS Lambda functions are used to handle data flow between MongoDB Atlas, SageMaker, and other AWS services. When a loan application is submitted via the mobile app, a Lambda function retrieves the application data from MongoDB, processes it, and sends it to the SageMaker endpoint for classification. After the classification is complete, Lambda stores the results back in MongoDB Atlas and updates the loan approval status in the system.
4. Web Admin Portal for Loan Approval
Loan approvers access the web portal, where they can view loan application details, credit risk scores, and other relevant data stored in MongoDB Atlas. The portal presents the loan details alongside the machine-generated risk classification (high or low). Based on the credit risk score, loan approvers can decide the loan amount, interest rate, and terms.
The system also includes an advanced search feature powered by Amazon Voice Services (AVS), which helps approvers search for similar loan applications based on applicant profiles. This feature provides deeper insights into the loan portfolio, assisting approvers in making more informed decisions.
5. AWS S3 and SageMaker Data Wrangler for Model Retraining
Over time, as more loan applications are processed, the data becomes valuable for improving the machine learning model. AWS S3 is used to store historical loan application data, and SageMaker Data Wrangler is employed to clean and preprocess the data for retraining the model. Periodic retraining ensures that the model remains accurate and up-to-date as new patterns emerge in applicant data.
Solution Workflow
The loan approval process involves several steps that are tightly integrated with AWS services and MongoDB Atlas to ensure efficiency and scalability:
1. Loan Application Submission
The process begins when the user submits a loan application through the mobile app. The app collects essential personal data and application details, which are then synced with MongoDB Atlas using Realm & Atlas Device Sync. This ensures that the data is available in real time, both for the user’s device and for the cloud-based database.
2. Credit Risk Classification
Once the loan application data is submitted, it is sent to AWS Lambda for processing. The Lambda function fetches the application data from MongoDB Atlas and sends it to the SageMaker endpoint for credit risk classification. Based on the model’s prediction, the application is categorized as either high-risk or low-risk.
3. Loan Approval by Admin
The web portal fetches the loan application data from MongoDB Atlas and displays it along with the credit risk score. Loan approvers use the portal to review the application details and decide on the loan amount, interest rate, and approval status. The AVS-powered search feature enables approvers to search for similar applicants, helping them make better, data-driven decisions.
4. Loan Decision Syncing
Once the loan is approved or rejected, the decision is synced back to the applicant’s mobile app via Atlas Device Sync. This ensures that the applicant receives immediate feedback, reducing the time between application submission and loan approval.
5. Model Retraining and Improvement
Loan application data is periodically moved to S3 for analysis and preprocessing using SageMaker Data Wrangler. The data is used to fine-tune the machine learning model, allowing the system to continually improve its predictions and risk assessments.
Benefits of the Solution
1. Faster Decision-Making
By automating the risk classification with AWS SageMaker, the system enables loan approvers to make faster, more consistent decisions, reducing the time from loan application to approval.
2. Scalability
The system is built to scale. By leveraging AWS Lambda, SageMaker, and MongoDB Atlas, the solution can easily handle growing amounts of loan application data and more complex workflows without performance degradation.
3. Real-Time Data Sync
With Realm & Atlas Device Sync, loan application data is automatically synchronized in real time across the mobile app and the cloud, ensuring that both applicants and loan approvers have access to the latest information.
4. Intelligent Risk Assessment
The use of machine learning with AWS SageMaker allows for more accurate and data-driven credit risk classification, reducing human bias and enabling better decision-making.
5. Continuous Improvement
The solution’s use of SageMaker Data Wrangler and S3 for periodic retraining ensures that the machine learning model stays up to date with changing data trends, improving its accuracy over time.
In the loan approval process, credit risk assessment plays a crucial role in determining whether an applicant is likely to repay the loan. However, preparing data for risk modeling can be a complex and time-consuming task. In this article, we’ll explore how we leveraged AWS SageMaker Data Wrangler to simplify and accelerate the data processing pipeline for loan applications. By integrating Data Wrangler with AWS SageMaker, MongoDB Atlas, and other AWS services, we automated the data preparation and risk classification for each loan application, making it easier for loan approvers to assess applicants based on their credit risk scores.
Loan approvers can now quickly review applications, compare them against similar cases, and make informed decisions—ultimately streamlining the approval process and enhancing overall efficiency.
Architecture Overview
The loan approval system we built integrates several AWS services and MongoDB Atlas to automate the end-to-end process, from loan application submission to approval. Here's a high-level overview of the architecture:

Mobile App: Loan applicants submit their data through a mobile app that syncs with MongoDB Atlas using Realm & Atlas Device Sync.
AWS SageMaker: A machine learning model classifies loan applicants into high-risk or low-risk categories based on their application data.
Web Admin Portal: Loan approvers review the applications, credit risk scores, and user profiles through a web portal, making real-time decisions.
Real-Time Sync: Approved loans are synced back to the mobile app, providing applicants with immediate feedback.
This architecture integrates real-time data synchronization, intelligent risk scoring, and scalable AWS infrastructure to create an efficient and automated loan approval process.
Key Components and Their Roles
1. Mobile App with Realm & Atlas Device Sync
Loan applicants use a mobile app to submit their loan applications, which include personal details like income, credit history, and requested loan amount. The app interacts with MongoDB Atlas via Realm & Atlas Device Sync to sync this data between the user’s device and the cloud. Realm ensures that data is stored locally on the device, with real-time synchronization occurring automatically when the device is online. This setup ensures data consistency and offline access, making the application more resilient.
2. AWS SageMaker for Credit Risk Classification
The core of the loan approval system is an AWS SageMaker endpoint that processes loan application data to classify applicants into high-risk or low-risk categories. This classification is done using a custom-trained machine learning model that takes the application data and evaluates the risk level based on historical patterns and predefined criteria.
The training of the model is done on historical loan data, stored in S3 buckets.
The real-time inference happens via a SageMaker endpoint, where application data is sent for classification.
The machine learning model helps loan approvers assess risk more objectively and make faster decisions based on the credit score predictions.
3. AWS Lambda for Data Processing
AWS Lambda functions are used to handle data flow between MongoDB Atlas, SageMaker, and other AWS services. When a loan application is submitted via the mobile app, a Lambda function retrieves the application data from MongoDB, processes it, and sends it to the SageMaker endpoint for classification. After the classification is complete, Lambda stores the results back in MongoDB Atlas and updates the loan approval status in the system.
4. Web Admin Portal for Loan Approval
Loan approvers access the web portal, where they can view loan application details, credit risk scores, and other relevant data stored in MongoDB Atlas. The portal presents the loan details alongside the machine-generated risk classification (high or low). Based on the credit risk score, loan approvers can decide the loan amount, interest rate, and terms.
The system also includes an advanced search feature powered by Amazon Voice Services (AVS), which helps approvers search for similar loan applications based on applicant profiles. This feature provides deeper insights into the loan portfolio, assisting approvers in making more informed decisions.
5. AWS S3 and SageMaker Data Wrangler for Model Retraining
Over time, as more loan applications are processed, the data becomes valuable for improving the machine learning model. AWS S3 is used to store historical loan application data, and SageMaker Data Wrangler is employed to clean and preprocess the data for retraining the model. Periodic retraining ensures that the model remains accurate and up-to-date as new patterns emerge in applicant data.
Solution Workflow
The loan approval process involves several steps that are tightly integrated with AWS services and MongoDB Atlas to ensure efficiency and scalability:
1. Loan Application Submission
The process begins when the user submits a loan application through the mobile app. The app collects essential personal data and application details, which are then synced with MongoDB Atlas using Realm & Atlas Device Sync. This ensures that the data is available in real time, both for the user’s device and for the cloud-based database.
2. Credit Risk Classification
Once the loan application data is submitted, it is sent to AWS Lambda for processing. The Lambda function fetches the application data from MongoDB Atlas and sends it to the SageMaker endpoint for credit risk classification. Based on the model’s prediction, the application is categorized as either high-risk or low-risk.
3. Loan Approval by Admin
The web portal fetches the loan application data from MongoDB Atlas and displays it along with the credit risk score. Loan approvers use the portal to review the application details and decide on the loan amount, interest rate, and approval status. The AVS-powered search feature enables approvers to search for similar applicants, helping them make better, data-driven decisions.
4. Loan Decision Syncing
Once the loan is approved or rejected, the decision is synced back to the applicant’s mobile app via Atlas Device Sync. This ensures that the applicant receives immediate feedback, reducing the time between application submission and loan approval.
5. Model Retraining and Improvement
Loan application data is periodically moved to S3 for analysis and preprocessing using SageMaker Data Wrangler. The data is used to fine-tune the machine learning model, allowing the system to continually improve its predictions and risk assessments.
Benefits of the Solution
1. Faster Decision-Making
By automating the risk classification with AWS SageMaker, the system enables loan approvers to make faster, more consistent decisions, reducing the time from loan application to approval.
2. Scalability
The system is built to scale. By leveraging AWS Lambda, SageMaker, and MongoDB Atlas, the solution can easily handle growing amounts of loan application data and more complex workflows without performance degradation.
3. Real-Time Data Sync
With Realm & Atlas Device Sync, loan application data is automatically synchronized in real time across the mobile app and the cloud, ensuring that both applicants and loan approvers have access to the latest information.
4. Intelligent Risk Assessment
The use of machine learning with AWS SageMaker allows for more accurate and data-driven credit risk classification, reducing human bias and enabling better decision-making.
5. Continuous Improvement
The solution’s use of SageMaker Data Wrangler and S3 for periodic retraining ensures that the machine learning model stays up to date with changing data trends, improving its accuracy over time.
Dive deeper on software development trends, emerging technologies and useful tools.
Dive deeper on software development trends, emerging technologies and useful tools.
Wekan Enterprise Solutions.
© Wekan Enterprise Solutions · All rights reserved · 14 NE 1st avenue, Miami 33132 FL
Wekan Enterprise Solutions.
© Wekan Enterprise Solutions · All rights reserved · 14 NE 1st avenue, Miami 33132 FL
Wekan Enterprise Solutions.
© Wekan Enterprise Solutions · All rights reserved · 14 NE 1st avenue, Miami 33132 FL