Friday, December 29, 2023

Using tools like Apache spark, Talend, Java in ETL to create a central data repository..

 In this blog post, I will show you how to use tools like Apache Spark, Talend, and Java in ETL and Java to create a central data repository. A central data repository is a place where you can store and access all your data from different sources in a consistent and reliable way. It can help you improve data quality, reduce data duplication, and enable data analysis and reporting.

To create a central data repository, you need to perform the following steps:

  1. Extract data from various sources, such as databases, files, web services, etc.
  2. Transform data to make it compatible and standardized, such as cleaning, filtering, joining, aggregating, etc.
  3. Load data into the central data repository, such as a data warehouse, a data lake, or a cloud storage.

To perform these steps, you can use tools like Apache Spark, Talend, and Java. Apache Spark is a distributed computing framework that can process large-scale data in parallel and in memory. Talend is a data integration platform that can connect to various data sources and provide graphical tools to design and execute ETL workflows. Java is a general-purpose programming language that can be used to write custom logic and scripts for data processing.

Here is an overview of how these tools work together:

  • You can use Talend to design and run ETL jobs that extract data from various sources and load it into Apache Spark.
  • You can use Apache Spark to transform the data using its built-in libraries or custom Java code.
  • You can use Talend or Java to load the transformed data into the central data repository.

To install Apache Spark, you need to follow these steps:

  1. Download the latest version of Apache Spark from its official website: https://spark.apache.org/downloads.html
  2. Extract the downloaded file to a location of your choice, such as C:\spark
  3. Set the environment variables SPARK_HOME and JAVA_HOME to point to the Spark and Java installation directories, respectively.
  4. Add the bin subdirectory of SPARK_HOME to your PATH variable.
  5. Verify that Spark is installed correctly by running the command spark-shell in a terminal or command prompt. You should see a welcome message and a Scala prompt.

You have now successfully installed Apache Spark on your machine. In the next section, I will show you how to use Talend to design and run ETL jobs that extract data from various sources and load it into Apache Spark.

Talend

Talend is a powerful and versatile tool for designing and running ETL (Extract, Transform, Load) jobs that can handle data from various sources and load it into Apache Spark, a distributed computing framework for large-scale data processing. In this blog post, we will show you how to use Talend to create a simple ETL job that extracts data from a CSV file, transforms it using a tMap component, and loads it into a Spark Data Frame.

The steps to Install Talend are as follows:

  1. Create a new project in Talend Studio and name it SparkETL.
  2. In the Repository panel, right-click on Job Designs and select Create job. Name the job SparkETLJob and click Finish.
  3. In the Palette panel, search for tFileInputDelimited and drag it to the design workspace. This component will read the CSV file that contains the input data.
  4. Double-click on the tFileInputDelimited component and configure its properties. In the Basic settings tab, click on the [...] button next to File name/Stream and browse to the location of the CSV file. In this example, we use a file called customers.csv that has four columns: id, name, age, and country. In the Schema tab, click on Sync columns to automatically infer the schema from the file.
  5. In the Palette panel, search for tMap and drag it to the design workspace. This component will transform the input data according to some logic. Connect the tFileInputDelimited component to the tMap component using a Row > Main connection.
  6. Double-click on the tMap component and open the Map Editor. You will see two tables: one for the input data and one for the output data. In this example, we want to transform the input data by adding a new column called status that indicates whether the customer is young (age < 30), old (age > 60), or middle-aged (30 <= age <= 60). To do this, we need to add an expression in the Expression Builder of the status column. Click on the [...] button next to status and enter the following expression:
  7. `row1.age < 30 ? "young" : row1.age > 60 ? "old" : "middle-aged"`
  8. Click OK to save the expression.
  9. In the Palette panel, search for tSparkConfiguration and drag it to the design workspace. This component will configure the connection to Spark and set some parameters for the job execution. Connect the tSparkConfiguration component to the tMap component using a Trigger > On Subjob Ok connection.
  10. Double-click on the tSparkConfiguration component and configure its properties. In the Basic settings tab, select Local mode as the Run mode and enter 2 as the Number of executors. You can also adjust other parameters such as Driver memory or Executor memory according to your needs.
  11. In the Palette panel, search for tCollectAndCheckSparkconfig and drag it to the design workspace. This component will collect and check all the Spark configurations in the job and display them in the console. Connect the tCollectAndCheckSparkconfig component to the tSparkConfiguration component using a Trigger > On Subjob Ok connection.
  12. In the Palette panel, search for tDatasetOutputSparkconfig and drag it to the design workspace. This component will load the output data from tMap into a Spark DataFrame. Connect the tDatasetOutputSparkconfig component to the tMap component using a Row > Main connection.
  13. 11. Double-click on the tDatasetOutputSparkconfig component and configure its properties. In the Basic settings tab, enter customers as the Dataset name. This name will be used to identify the DataFrame in Spark.
  14. Save your job and run it by clicking on Run in the toolbar or pressing F6. You will see some logs in the console that show how your job is executed by Spark. You can also check your Spark UI by opening http://localhost:4040 in your browser.

Congratulations! You have successfully created an ETL job that extracts data from a CSV file, transforms it using Talend, and loads it into Apache Spark.

ETL and FHIR in creating a central data repository

 In this blog post, we will explore how ETL (Extract, Transform, Load) and FHIR (Fast Healthcare Interoperability Resources) can be used to create a central data repository for healthcare data. A central data repository is a single source of truth that integrates data from multiple sources and provides a consistent and reliable view of the data. ETL and FHIR are two key technologies that enable the creation of a central data repository.

ETL is a process that extracts data from various sources, transforms it into a common format, and loads it into a target database or data warehouse. ETL can handle different types of data, such as structured, semi-structured, or unstructured data, and apply various transformations, such as cleansing, filtering, aggregating, or enriching the data. ETL can also perform quality checks and validations to ensure the accuracy and completeness of the data.

FHIR is a standard for exchanging healthcare information electronically. FHIR defines a set of resources that represent common healthcare concepts, such as patients, medications, observations, or procedures. FHIR also defines a common way of representing and accessing these resources using RESTful APIs. FHIR enables interoperability between different systems and applications that use healthcare data.

By using ETL and FHIR together, we can create a central data repository that has the following benefits:

  • It reduces data silos and fragmentation by integrating data from multiple sources and systems.
  • It improves data quality and consistency by applying standard transformations and validations to the data.
  • It enhances data usability and accessibility by providing a common way of querying and retrieving the data using FHIR APIs.
  • It supports data analysis and decision making by enabling the use of advanced tools and techniques, such as business intelligence, machine learning, or artificial intelligence.

Illustration

To illustrate how ETL and FHIR can be used to create a central data repository, let's consider an example scenario. Suppose we have three different sources of healthcare data: an electronic health record (EHR) system, a laboratory information system (LIS), and a pharmacy information system (PIS). Each system has its own data format and structure, and they do not communicate with each other. We want to create a central data repository that integrates the data from these three sources and provides a unified view of the patient's health information.

The steps to create the central data repository are as follows:

  1. Extract the data from each source system using the appropriate methods and tools. For example, we can use SQL queries to extract data from relational databases, or we can use APIs to extract data from web services.
  2. Transform the extracted data into FHIR resources using mapping rules and logic. For example, we can map the patient demographics from the EHR system to the Patient resource, the laboratory results from the LIS system to the Observation resource, and the medication prescriptions from the PIS system to the MedicationRequest resource.
  3. Load the transformed FHIR resources into the target database or data warehouse using FHIR APIs or other methods. For example, we can use HTTP POST requests to create new resources or HTTP PUT requests to update existing resources.
  4. Query and retrieve the FHIR resources from the central data repository using FHIR APIs or other methods. For example, we can use HTTP GET requests to read individual resources or search parameters to filter and sort resources.

By following these steps, we have created a central data repository that integrates the healthcare data from three different sources using ETL and FHIR. We can now access and use this data for various purposes, such as clinical care, research, or quality improvement.

In conclusion, ETL and FHIR are two powerful technologies that can help us create a central data repository for healthcare data. By using ETL and FHIR together, we can overcome the challenges of data integration, quality, usability, and accessibility, and leverage the full potential of our healthcare data.

Monday, August 21, 2023

GSoC Project Final Report : Adding Support for FHIR Patch Operations in OpenMRS

Overview

OpenMRS is using the FHIR API more and more in place of the REST API. However, the FHIR API is, by default, quite verbose. Supporting PATCH operations would allow us to support partial updates to FHIR resources without needing to send the whole resource from the client to the server.

The journey of enhancing OpenMRS through the addition of support for FHIR Patch Operations has been a remarkable experience. In light of the growing importance of the FHIR API as a replacement for the REST API, this project sought to introduce PATCH operations to enable more efficient partial updates to FHIR resources. This feature empowers users to modify specific elements within resources without the need to transmit entire resources between the client and the server.

In the context of the HAPI FHIR library, "patching" refers to the process of making partial updates to a FHIR (Fast Healthcare Interoperability Resources) resource. FHIR resources are representations of healthcare-related data, such as patient, observations, medications, etc., designed to be easily shared and exchanged between different healthcare systems.

Patching allows you to modify specific parts of a FHIR resource without having to replace the entire resource. This can be particularly useful when you want to make minor updates or corrections to a resource without sending the entire payload over the network. The PATCH operation follows the HTTP PATCH method semantics and is designed to be more efficient than the PUT or POST methods for updating resources, especially when dealing with large resources or slow network connections.

Objectives:

  • Implement JSON PATCH operations on all OpenMRS FHIR R4 resources and ensure to have the tests working perfectly. - COMPLETED ✅
  • Implement JSON MERGE PATCH operations on all OpenMRS FHIR R4 resources and ensure to have the tests working perfectly. - COMPLETED ✅
  • Implement XML PATCH operations on all OpenMRS FHIR R4 resources and ensure to have the tests working perfectly. - COMPLETED ✅

Contributions:

During the project, I worked on various code repositories and pull requests to bring the functionality of PATCH operations to the FHIR API:

Repositories

Pull Requests: 

Found Issues:

    1. Medication Dispense was not added to the landing page of the FhirIG
    2. Medication was not added to the landing page of the FhirIG

Fixed Issues:

    1.  https://github.com/openmrs/openmrs-contrib-fhir2-ig/pull/59
    2. https://github.com/openmrs/openmrs-contrib-fhir2-ig/pull/60

 Any other work

             During this GSoC jouney, i was able to do other work on the FHIR module as asigned by my mentor and below are the pull requests;-

    1. FM2-606: Mapping ContactPoints to OpenMRS
    2. FM2-605: Add Support for ETags in the FHIR API
    3. FM2-481: Clean up parameter passing for all Service class methods

Talk Thread links:

  1. https://talk.openmrs.org/t/gsoc-2023-fhir-add-support-for-fhir-patch-operations-project-updates/39555  

Weekly Blog Posts:

Throughout the development cycle, I chronicled my progress and insights through weekly blog posts:

  1. GSOC WEEK 12 - WRAPPING UP A FRUITFUL GSOC JOURNEY: ADDING CONTACT POINTS TO THE OPENMRS-MODULE-INITIALIZER
  2. GSOC WEEK 11: A JOURNEY OF REFINEMENT AND INNOVATION IN FHIR2 MODULE
  3. GSOC WEEK 10 - EMBRACING THE FINALE: REFLECTING ON MY GSOC JOURNEY
  4. GSOC 2023 AT OPENMRS | CODING PERIOD WEEK 09
  5. GSOC 2023 AT OPENMRS | CODING PERIOD WEEK 08
  6. GSOC 2023 AT OPENMRS | CODING PERIOD WEEK 07
  7. GSOC 2023 AT OPENMRS | CODING PERIOD WEEK 06
  8. GSOC 2023 AT OPENMRS | CODING PERIOD WEEK 05
  9. GSOC 2023 AT OPENMRS | CODING PERIOD WEEK 04
  10. GSOC 2023 AT OPENMRS | CODING PERIOD WEEK 03
  11. GSOC 2023 AT OPENMRS | CODING PERIOD WEEK 02
  12. GSOC 2023 AT OPENMRS | CODING PERIOD WEEK 01
  13. GSOC 2023 AT OPENMRS | COMMUNITY BONDING PERIOD

Video(Demo)


Resources:

Future Works:

While significant strides have been made, certain aspects require further attention. Completing the integration of PATCH operations with complex FHIR resources is a priority. Additionally, refining error handling and optimizing performance will ensure a robust implementation.

Part of the work to be worked on includes but not limited to the following;-

  • Ensure Service Request resource is  fully translated 
  • Ensure Service Request resource supports all the PATCHing formats implemented above.
  • Write more tests in regard to the how the json documents and xml documents should be written in the attempt to give clearer error messages.(To Be Discussed)
  • We could also implement the PATCH operations for the R3 resources(To Be Discussed).

Thoughts on GSoC:

Participating in GSoC 2023 on the OpenMRS platform has been an enlightening journey. Working with esteemed mentors like Ian Bacher and Abert Namanya has been instrumental in my growth. I have gained insights into the world of healthcare informatics, API optimization, and collaborative open-source development. Looking forward, GSoC has paved the way for a continued commitment to enhancing healthcare technology. My knowledge in FHIR standards and OpenMRS data model and how the two integrate to complement each other has greatly grown.

A special thanks to GSoC, OpenMRS, mentors, and the vibrant community for making this journey a resounding success! 🌟🌐

Sunday, August 20, 2023

GSoC Week 12 - Wrapping Up a Fruitful GSoC Journey: Adding Contact Points to the openmrs-module-initializer

Introduction:

Greetings, dear readers and fellow enthusiasts of healthcare technology! It is with a mix of emotions that I write this final blog post, marking the culmination of my exhilarating Google Summer of Code (GSoC) journey. As I step into the 12th and final week, I find myself reflecting on the remarkable progress, invaluable learnings, and the gratifying experience of contributing to the OpenMRS community. Join me in celebrating the journey, as I highlight the achievements of this concluding week!

Week 12: A Grand Finale of Innovation and Integration

The 12th week of GSoC was not just a closure but a grand finale, as I embraced the opportunity to contribute a significant feature to the OpenMRS ecosystem. Building upon the groundwork laid in the previous weeks, I embarked on a unique endeavor: integrating Contact Points into the OpenMRS-Module-Initializer. This initiative aimed to enhance the functionality of OpenMRS by ensuring seamless configuration and representation of Contact Points, enriching the interoperability of healthcare data.

A Fusion of Coding and Collaboration:

Week 12 was a harmonious fusion of coding, collaboration, and the pursuit of excellence. Here's a glimpse into the key milestones of the week: With a clear vision of the project's goals, I set out to integrate Contact Points into the OpenMRS-Module-Initializer. This involved configuring Contact Points as attribute types and enabling dynamic updates through the initializer, adding csv files, etc. 

The GSOC Reflection: A Journey of Growth and Impact

As I reflect on this GSoC journey, I'm struck by the depth of growth and impact it has brought. From diving into FHIR PATCH operations to integrating Contact Points, every challenge and accomplishment has contributed to my journey as a developer and an open-source advocate.

A Note of Gratitude: Mentors and Community

My journey wouldn't have been possible without the exceptional guidance and support of my primary mentor, @ibacher, and backup mentor, @abertnamanya. Their expertise, mentorship, and encouragement have been invaluable assets that enriched my GSoC experience.

Looking Beyond: A Future of Possibilities

As GSoC draws to a close, a new chapter begins. Armed with the skills honed, lessons learned, and connections made, I'm excited to continue contributing to OpenMRS and the broader healthcare technology landscape.

Conclusion:

As I bid farewell to this transformative GSoC journey, I am filled with gratitude, accomplishment, and the promise of new beginnings. The path ahead holds countless opportunities to shape healthcare technology, collaborate with passionate minds, and drive innovation.

Thank you for joining me on this incredible voyage. Let's continue to code, collaborate, and create a future where technology and healthcare seamlessly converge. Until we meet again, let's continue making a difference, one commit at a time!

Write code, save lives!

Friday, August 18, 2023

GSoC Week 11: A Journey of Refinement and Innovation in FHIR2 Module

Introduction:

Greetings, fellow enthusiasts of healthcare technology and data interoperability! The journey through Google Summer of Code (GSoC) continues, and as I step into the 11th week, I find myself in the midst of a captivating process of refining and innovating within the OpenMRS FHIR2 module. Building upon the foundations laid in the previous week, I'm excited to share the progress made and the steps taken to enhance the attribute-to-contact-point configuration process. Join me as we explore the intricacies of this journey of refinement and innovation!

A Continuation of Progress:

Week 11 has been a seamless continuation of the work initiated in the previous week. With the guidance of my mentor, I have been actively engaged in refining the pull request initiated in week 10 (https://github.com/openmrs/openmrs-module-fhir2/pull/517). This pull request addresses a critical aspect of configuring attributes as contact points, focusing on ensuring the uniqueness of the combination of attribute_type_domain and attribute_type_id.

Unique Combinations: The Logic Behind the Scenes:

One of the key challenges when dealing with attributes as contact points lies in preventing the duplication of data. This week, the spotlight has been on devising logic that ensures each combination of attribute_type_domain and attribute_type_id remains unique. This logic is essential to maintain data integrity and prevent conflicts within the system.

Suggested Approach: Loading and Updating Existing Values:

To tackle this challenge, a suggestion has been made to load the existing value for a specific combination and update it if it already exists. This approach leverages the power of data retrieval and manipulation to efficiently manage attribute-to-contact-point configurations. By employing this technique, we ensure that the uniqueness constraint is upheld while providing a seamless experience for users and administrators.

Below is the snippet i added to address the changes suggested by my mentor.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
	@Override
	public FhirContactPointMap saveFhirContactPointMap(@Nonnull FhirContactPointMap contactPointMap) {
		FhirContactPointMap existingContactPointMap = (FhirContactPointMap) sessionFactory.getCurrentSession().createQuery(
		    "from FhirContactPointMap fcp where fcp.attributeTypeDomain = :attribute_type_domain and fcp.attributeTypeId = :attribute_type_id")
		        .setParameter("attribute_type_domain", contactPointMap.getAttributeTypeDomain())
		        .setParameter("attribute_type_id", contactPointMap.getAttributeTypeId()).uniqueResult();
		
		if (existingContactPointMap != null) {
			existingContactPointMap.setSystem(contactPointMap.getSystem());
			existingContactPointMap.setUse(contactPointMap.getUse());
			existingContactPointMap.setRank(contactPointMap.getRank());
			sessionFactory.getCurrentSession().merge(existingContactPointMap);
			return existingContactPointMap;
		} else {
			sessionFactory.getCurrentSession().saveOrUpdate(contactPointMap);
			return contactPointMap;
		}
	}

Let's break down the code and discuss its functionality:

Explanation:

  • The method takes a `FhirContactPointMap` object (`contactPointMap`) as a parameter and returns a `FhirContactPointMap` object.
  • The code queries the database to check if there is an existing `FhirContactPointMap` with the same `attributeTypeDomain` and `attributeTypeId`. This is done using Hibernate's Query Language (HQL).
  • If an existing map is found, its `system`, `use`, and `rank` values are updated with the values from the `contactPointMap` parameter. The `merge` method is used to update the existing entity.
  • If no existing map is found, the `saveOrUpdate` method is used to either save a new `contactPointMap` or update an existing one, based on its state.

This method essentially checks whether a given `FhirContactPointMap` already exists in the database based on the specified attributes (`attributeTypeDomain` and `attributeTypeId`). If it exists, the method updates its values; otherwise, it creates a new entry or updates an existing one.

Mentorship and Collaboration: Navigating Complex Challenges:

As with any intricate technical challenge, mentorship and collaboration have played a pivotal role in navigating through the complexities. My mentor's insights and guidance have been invaluable in devising the most effective and elegant solutions. Collaborative discussions and code reviews have provided fresh perspectives and refined the implementation, aligning it with best practices and the standards of the OpenMRS community.

The Road Ahead: Beyond Configuration:

As week 11 draws to a close, I'm filled with a sense of accomplishment and eagerness for what lies ahead. The journey of refining and innovating within the FHIR2 module is a testament to the spirit of continuous improvement and the drive to enhance healthcare data interoperability. Beyond attribute-to-contact-point configuration, this experience underscores the broader mission of GSoC – to contribute to meaningful solutions that impact patient care and healthcare technology.

Conclusion:

Week 11 has been a captivating chapter in the GSoC journey, marked by diligent refinement and the pursuit of innovative solutions. The process of ensuring unique combinations of attributes within the FHIR2 module represents a microcosm of the larger goal – to create a more connected, efficient, and effective healthcare ecosystem.

As I look ahead to the final weeks of GSoC, I'm invigorated by the progress made and inspired by the collaborative spirit that drives the OpenMRS community. Join me as we continue to push the boundaries of healthcare technology and explore new frontiers in the quest for excellence.

Thank you for being a part of this incredible journey. Until next time, let's keep pushing forward and making a positive impact!

Thursday, August 10, 2023

Embracing the Finale: Reflecting on My GSoC Journey - GSoC Week 10



Introduction:

Greetings, dear readers! It's with a mix of emotions that I pen down this blog post, for the end of my Google Summer of Code (GSoC) journey is drawing near. As I look back on the incredible weeks that have led me to this point, I'm filled with a sense of accomplishment, gratitude, and a touch of nostalgia. Join me as I reflect on the experiences, challenges, and growth that have defined this remarkable journey.

A Journey of Learning and Growth:

The past weeks of GSoC have been a whirlwind of coding challenges, collaboration with mentors and peers, and a continuous drive to make meaningful contributions. From delving into the complexities of healthcare interoperability to mastering the nuances of the OpenMRS data model, every step of this journey has been a learning opportunity.

New Horizons: Mapping FHIR Contact Points to OpenMRS Data Model:

In the realm of healthcare data interoperability, mapping data structures from one standard to another can be both fascinating and complex. One such challenge arises when we consider "Contact Points" in FHIR and their representation in OpenMRS. In this blog post, we'll explore the intricacies of this mapping process, particularly focusing on the intriguing concept of determining the priority or "rank" of different values. Join me as we delve into the world of Contact Points and unravel the puzzle of priorities!

Understanding Contact Points in FHIR and OpenMRS:

In the Fast Healthcare Interoperability Resources (FHIR) standard, Contact Points play a vital role in representing various means of contacting an individual, such as phone numbers, emails, faxes, and pagers. These contact details hold different types and values, providing comprehensive ways to reach out to patients, providers, and other stakeholders.

On the other hand, OpenMRS manages these types of data through Attributes. Attributes like PersonAttributes, LocationAttributes, and ProviderAttributes are used to store additional information about persons, locations, and providers. While the structure seems clear, a significant challenge arises when we consider the ranking or priority of these values.

Prioritizing Values: The FHIR "Rank" Conundrum:

FHIR introduces the concept of "rank" to indicate the relative priority of different contact values. For example, if a patient has multiple phone numbers, the rank helps determine which phone number should be preferred for communication. This prioritization ensures efficient and effective contact strategies. In the OpenMRS ecosystem, this notion of "rank" is not inherently present. The challenge lies in translating FHIR's rank-based approach into OpenMRS's attribute-based system. How do we decide which phone number or email to prioritize when contacting a patient?

Solution

The solution we came up with was to create a mapping table called fhir_contact_points_map between attribute and attribute_type that would store values like system, use and rank. It is those values what we translate in the telecom translator. Catch the ongoing work at https://github.com/openmrs/openmrs-module-fhir2/pull/517/files

The Joy of Collaboration:

The heart of open-source development lies in collaboration. Through code reviews, discussions, and interactions with the OpenMRS community, I've come to appreciate the power of collective wisdom. Collaborating with experienced mentors and learning from their feedback has been instrumental in elevating the quality of my work.

Celebrating Milestones:

Looking back, I can't help but celebrate the milestones achieved during this journey. From implementing critical features to squashing bugs and enhancing documentation, each achievement has contributed to the evolution of the OpenMRS FHIR2 module. These accomplishments wouldn't have been possible without the encouragement and support of the community.

Personal Growth and Beyond:

Beyond the code, GSoC has been a catalyst for personal growth. I've become a more resilient problem-solver, a better communicator, and a more effective collaborator. These skills extend beyond the realm of technology, enriching my journey both as a developer and as an individual.

A Bittersweet Farewell:

As GSoC reaches its conclusion, I can't help but feel a bittersweet mix of emotions. While saying goodbye to this chapter is not easy, I'm filled with gratitude for the experiences, connections, and insights I've gained. GSoC has been more than just a coding program; it's been a transformative experience that will resonate in my journey ahead.

Conclusion:

As I bid adieu to GSoC, I want to express my heartfelt gratitude to the OpenMRS community, mentors, peers, and readers like you who have been a part of this journey. The end might be nigh, but the memories, lessons, and connections forged during these weeks will endure. This is not the end, but a stepping stone to new endeavors, fresh challenges, and a continued commitment to the open-source spirit.

Thank you for joining me on this adventure. Here's to the end and to new beginnings! Until we meet again.

Cheers!

Thursday, August 3, 2023

GSoC 2023 at OpenMRS | Coding Period Week 09

 

Introduction

Welcome back to my GSoC journey at OpenMRS! Week 9 has been an exciting and challenging phase as i continued my work on the OpenMRS Fhir2 module. After successfully completing the patching operations, i eargerly embarked on the next task from my mentor which is support Etags on the module. In this blog post i will share my experiences, achievements and the lessons i have learned during this week of my project.

Understanding Etags

Before diving into the implementation, i spent sometime researching and understanding what Etags are and their use. Etags, short for "entity tags", are a mechanism used for resource versioning and caching in web applications. In the context of FHIR, Etags play a crucial role in ensuring efficient communication between clients and servers. They provide a way for clients to keep track of resource versions and determine if resources have been updated on the server since they were last retrieved. So the purpose of this task was to trigger HAPI FHIR's support for FHIR's Etag spec which uses the weak Etag's for the version id.

Note: this algorithm could result in the FHIR API reporting a changed Etag where the REST API does not which is because the Etag/versionId in FHIR API is derived entirely from the lastUpdated timestamp.

Here is the ticket -> https://issues.openmrs.org/browse/FM2-605.

Here is what i did to that effect

  • I added the logic to generate and manage Etags for each resource using the version id which derived from the lastUpdated timestamp. So whenever a resource is updated, its Etag is automatically updated. 
  • I added unit tests and integration tests to ensure the functionality works to perfection and ofcourse ensure that it doesn't break the existing tests and functionality.

Explanation of how it works in OpenMRS

When a GET request is sent out to the server to retrieve a resource for example Patient Resource with a given uuid i.e 3d50d0c2-257e-4262-a48b-3e9b3dbffefd, the response is returned and inside the response header, comes an etag too ie  ETag: W/"3141" . So when the user attempts to send another request to the server with the attached if-None-match header containing the above etag, the server returns a 304 status code response if the resource has not changed since the last retrieval. If the resource has changed then a 200 status code will be returned with the latest version of the resource and a new Etag. 

Resources i used



Until next time,
Cheers