California Health and Human Services Agency: Creating a Culture of Open Data

Introduction

The California Health and Human Services Agency oversees 12 departments and three offices, employs nearly 34,000 staff and administers a budget of $140 billion. In 2014, when the Agency embarked on a unified process to publicly release its non-confidential data, it had a complex task before it. Successfully launching a portal for its public data required executive leadership, a shared vision across diverse programs, commitment by personnel and public-private partnerships. Almost two years later, the project has succeeded, allowing health advocates, civic technologists, journalists and anyone else to access and use data more easily than ever before. As a result of its success, the Agency plans to continue the open data program as part of a larger effort to better use information and data inside and outside of government.

Project Summary

Converging Forces Inside and Outside of Government

This story begins inside and outside of government. Inside the California Health and Human Services Agency, both people and processes converged:

  • People: Two departments, the California Department of Public Health and the Office of Statewide Health Planning and Development, are data-rich organizations that already published data in a number of different formats. Key staff in those and other departments, along with the Agency’s Chief Information Officer, had an interest in data sharing across departments and making data more available and transparent.

  • Process: In 2010 the Agency received a federal Health Information Exchange grant as part of the American Recovery and Reinvestment Act. While the grant mainly focused on data sharing among healthcare providers, it also allowed the Agency to focus on internal data sharing to improve services its departments provide to the public. Their success in creating a health information exchange plan supported a second award in 2012 that they used to examine opportunities for health and human services data sharing and interoperability. This new grant resulted in two outcomes:

    • The Agency began the process of identifying data internally, where it was housed and in what formats. This created new opportunities for departments to share data with each other.

    • The Agency identified a governance structure for data, composed of a cross-department committee, which focused on leveraging data to better support the Agency’s mission. This governance structure would prove to be a critical element in the success of the open data program.

Outside of government, one foundation played critical roles in bringing this project to fruition:

  • At the end of 2011, the California Health Care Foundation began a “Free the Data” initiative. As part of that initiative, they sought a partnership with a state agency that would result in greater public engagement with government data. Today, many agencies publish data in charts that are embedded into reports, or offered as downloadable files. In the latter case, either the format or limited descriptions of the data often prevent researchers and community members from analyzing and using the information independently. In contrast, the Free the Data Initiative advocated for “open” data – meaning it has no restrictions on use – is machine-readable and is accessible via an API (Application Programming Interface), a way of publishing data that allows programmers to connect their websites to the data, making updates automatic and use easier.

As Agency and Foundation staff began to discuss an opportunity for partnership, they looked at a number of opportunities: publishing existing data in more accessible forms, publishing new non-confidential data, securely sharing confidential data across departments and ensuring data from different departments was interoperable. They started focusing on one cornerstone of this larger effort, an initiative to make existing public data more available – to make it open. They believed starting here would create examples underscoring how open data could inform decision making and drive better health outcomes in local communities. They also believed the emerging governance structure would bolster the implementation of an agency-wide open data project; an ideal opportunity to establish data governance. While other government open data projects have focused on a decrease in requests for public records, in interviews with senior leaders at the Health and Human Services Agency, it is clear they believed making more information available to the public was a part of “good government” and would also catalyze more strategic use of data internally, helping departments make better decisions and better serve the public. While open data may ultimately decrease costs through fewer Public Records Act requests, Agency leaders view that as a good by-product but not the reason for doing it.

Executive Commitment

As in any organization, senior leadership needed to sign off before an Agency-wide initiative could begin. For the Health and Human Services Agency, that meant the Undersecretary for Budget and Administration needed to agree that this project was worth the staff time and that it would produce useful results. To prepare for the project – and the pitch – the Agency’s Chief Information Officer convened an informal group of data management leaders from several departments. This group strategized about how to gain executive buy-in for the project. Concurrently, the Foundation also met with senior leadership in the Agency. Despite this planning, the Undersecretary was not initially convinced of the project’s merits. There was a language gap between the project team and the Undersecretary. The departments and the Foundation talked about “open data” and “hacking” – both terms that sound risky to an Agency safeguarding the private health information of millions of people. After some months of discussion, the Undersecretary and the department staff found common language in a framework originally developed by New York State’s Health Department. In New York, the Department had classified data into three tiers: Tier 1 – clearly public data, often already published; Tier 2 – data that might be able to be made public with some review or cleaning; and Tier 3 – private and confidential data that the Department will never make public. The project team made it clear they were only concerned with Tier 1 data in the beginning and Tier 2 data in later stages. The Agency would never publicly release Tier 3 data. The adoption of this rubric answered the Undersecretary’s major reservations, transforming his skepticism into active support.

Planning for Success

With executive sponsorship in place, the project began in earnest. The California Health Care Foundation provided a $250,000 grant to cover subscription costs for a service to host the data and provide data visualizations, charts, graphs and other tools that the public could use to understand the data. This allowed the Agency team, composed of staff from different departments and the Office of the Secretary, to focus on the process rather than the technology. The team could have launched the project after only moving datasets that were already available in Excel or PDF into the portal. They did not do this because the ultimate goal was to put in place a structure accommodating both open data publishing and secure data sharing between departments. Instead, to accomplish this, the team transformed the data into machine-readable formats and created a robust process that included data review by the governance team.

The leadership team did not have to start from scratch. Open Data initiatives are expanding across the country. In New York, the state’s health agency created a handbook for their initiative. The California team built off this handbook, customizing it for their needs. Such cross collaboration is popular throughout the open data movement. If an organization is starting a new initiative, it’s likely they can borrow and build off of what another city or state has done.

Piloting the Project

Once procedures were in place, the team was ready to begin publishing datasets. One department needed to pilot the project; their datasets would test the process and the governance structure. The California Department of Public Health, whose staff had served on the leadership team and had invested significant time in drafting the handbook, volunteered to pilot the project. This made sense – the department was data rich, which resulted in a department culture that valued the project. The department also had the knowledge and abilities to react nimbly when inevitable issues arose during implementation. The Department of Public Health is large – approximately 3,500 staff – which allowed it to respond with the skilled resources required to clean and reformat the data. Finally, choosing Public Health was an important way to acknowledge the departmental resources it had already invested in the project.

Publishing Data

It took the Department of Public Health four months to identify, organize and clean the initial datasets. They published data on a commercial platform paid for through the grant from the California Health Care Foundation. The reaction from the public was overwhelmingly positive, and journalists and civic groups alike began to use the data.

After the successful pilot, the Undersecretary chose the Office of Statewide Health Planning and Development to head up the next phase. Like Public Health, this office is a data-rich organization with significant capacity to use data. Also like Public Health, their team had been involved in the project from its inception. At that point, the Agency decided that it did not make sense for each department to publish its own portal. Members of the public would have difficulty finding data if it were dispersed across 12 distinct data portals. It made more sense to create one Agency-wide site that would host data from all departments. The Office of Statewide Health Planning and Development then became the first department on the California Health and Human Services Agency’s open data site.

Once Public Health and the Office of Statewide Health Planning and Development published their initial data, the Agency needed to find a way to bring departments with less experience publishing data into the project. The solution was to leverage the Agency governance structure to organize data managers from each department and designate one department, Statewide Health Planning and Development, to serve the Agency with project management, technical administration and training. As other departments came on board, the project management team sequenced the order of publications, provided training to each department’s project team and offered technical assistance whenever a department needed additional help. They also created a working group, under the Agency governance structure, that each department joined as it came on board. This peer-learning group facilitated sharing among departments as they moved through the phases of data publication: identifying data, making data ready for publication, publishing data and responding to feedback from stakeholders.

However, even with the lessons from the pilot, the departmental implementation still faced challenges. For example, the Agency had to find a way to streamline the review process. The initial process was intensive; the project’s Executive Sponsor, the Undersecretary, signed off on all data elements. Now that the Agency believed that the process worked and appropriately reduced risk, the Secretary’s office began to step back from the day-to-day role and delegate. Later, the Undersecretary delegated the final data sign-off to the Agency data governance committee, under the Agency CIO.

Connecting with the Community

Once a department published data, public information officers heard from members of the public and stakeholder groups, particularly journalists, early and often. That left a question – how could the open data project connect with communities that knew nothing about the Agency or the data? How could it more formally connect with developer communities that might want to use the data? The California HealthCare Foundation had an idea to help bridge the availability of open data with use of that data. It created a Health Ambassadors program, initially enlisting the help of local civic technologists in three cities: Fresno, Los Angeles and Sacramento. In partnership with community stakeholders, these ambassadors were and are creating online tools using data that the Health and Human Services Agency has made available through the open data project. The Ambassadors have also worked with local Code for America brigades to coordinate health data “code-a-thons” in several California communities. The code-a-thons are events in which application developers are challenged to make use of CHHS data in new and innovative ways. These code-a-thons build relationships between CHHS and local stakeholders, resulting in the development of several applications using CHHS data.

Additionally, working with the California HealthCare Foundation and other partners, the Agency now conducts a yearly “Open Data Fest”. This convening of stakeholders, practitioners, thought-leaders, and government staff serves to further the dialogue about the value and benefit of open data publishing. The event is more strategic rather than nuts-and-bolts, providing important space for interdisciplinary, public-private-nonprofit collaboration around the future of health data and open engagement.

How Did They Do It?

During the last two years, the California Health and Human Services Agency has reimagined how it makes data publicly available. Accomplishing this task required executive support, strong governance, and – of course - staff time. Fortunately, they were able to take advantage of a well-timed outside grant, an emerging governance structure, and a strong level of trust across departments, which allowed them to overcome any obstacles in their path.

Resources Required

Executive Support: Staff across the Agency indicated that executive support was critical to making open data an important project for all departments. Executive backing enabled the project leadership team to bring all decision makers to the table and empowered staff to find solutions to obstacles.

Governance Structure: Before the Agency identified a single dataset to publish on its portal, it created a governance structure for the project. Subject-matter experts, attorneys and public information officers sat at the same table from the beginning. The leadership team developed policies to identify, review and publish the data, giving comfort at all levels of the organization that it could safely publish this information in accessible formats. While all Agency departments are dedicated to providing access to essential health and human services, they each have different missions, operations, resource allocations and stakeholders. Building the project team with a cross-section of department leadership resulted in program policies reflecting Agency-wide opportunities and constraints, rather than just those of a single department.

Staff Time: While the data that the departments published were already public in some form, each department now had to organize the information to meet common standards. Making data machine-readable often requires reorganization and changes to existing data management processes. For example, if a department had published a single Excel file with six different tabs, the department now needed to reorganize each of those tabs into separate files. Data that had been published in a PDF format needed to be converted into a machine-readable format such as Comma Separated Values (CSV). Sometimes too, the way the department had organized data was incompatible with a computer reading it. Importantly, since data was now widely available on a single website, it was critical that the departments describe and document (i.e. metadata) how to use the datasets and whether the data had any limitations. Staff from different departments that participated in the leadership team invested significant time creating the structure for this program to work. Without their commitment to implementing this project – work that was often in addition to their regular job duties – the Agency would not have an open data program today.

Accelerating the Timeline

California HealthCare Foundation Grant: Many local, state and federal government open data projects have succeeded without outside grant funding. The Agency could have launched an open data portal using internal resources; however, the grant was still invaluable, particularly as it allowed them to move quickly. Rather than carving out money from an existing and constrained budget, this grant allowed the Agency to put their staff time towards publishing rather than procurement, which in turn facilitated a focus on process rather than technology. Knowing that the grant funding was not sustainable, the Secretary’s Office also made the project sign-off contingent on departments agreeing to use existing funding to support the project at the end of the pilot phase.

Existing Cross-Department Dialogue and Governance Process: As a result of the 2010 and 2012 federal grants, the Agency identified how data was used internally and began having conversations about how it could be used more effectively. This opened a dialogue across departments for the open data activities that began in 2013. These earlier grant-funded projects informed the Agency’s vision for a cross-department governance structure supporting the open data project.

Lessons and Products from Existing Programs: The California Health and Human Services Agency drew upon the experiences from other cities, counties and states. In particular, it built on the Open Data Handbook authored by New York’s Department of Health. This made a heavy lift more manageable and also provided a model the Agency could cite to show that the process worked.

Trust: Mutual trust among partners and reassurance from executive levels accelerated the project’s timeline. From previous joint projects, the Agency and Foundation already knew they could work together. Department staff also knew that the Agency would help them meet their goals and that they all held accountability to the project’s success.

Challenges

The California Health and Human Services Agency’s project officially began in April of 2014. By March 2016 all 12 departments had published data to the portal. Both the leadership team and department data stewards had to overcome obstacles in order to meet this compact timeline. Staff and management identified three main challenges: organizational structure, staff engagement and data limitations.

Organizational Structure: While the 2010 and 2012 federal grants laid the groundwork for an agency-wide data project, open data was still new to many departments. Historically, each department had acted fairly independently of the others. This project drove a cultural change requiring departments to adopt new norms for cross-department collaboration. Given the project’s large scale and the Agency’s concerns about protecting sensitive data, it is unsurprising that during the pilot phase the Agency adopted an intensive review process that ended with the Undersecretary approving individual data records and data dictionaries. However, once the Secretary’s office was comfortable with how the process was working, it delegated its authority to the Agency CIO and governance committee, shortening the publication workflow.

Consolidating department data on a single Agency portal rather than maintaining 12 separate department portals was a deliberate decision with practical benefits. A unified portal required fewer resources that some departments did not have and helped streamline what could have been a more complicated publishing process. To plan, implement and maintain a unified portal, the Agency created a governance model that brought together departments and staff that had otherwise worked separately. This process has served as a catalyst toward a more collaborative Agency culture.

Staff Engagement: The project team knew that even with sponsorship and support from the Secretary’s Office, each department needed to believe in the project’s goals and merits. It was to promote this buy-in that the project management team in the Office of Statewide Health Planning and Development met with each department’s leadership team for an initial meeting. During this event, the team discussed the project’s history and why data – and open data in particular – is an important function of the Agency. This helped inform each department’s thinking about why open data is important, even if the department had typically published data in other, more restricted ways. The project management team also provided significant assistance and support to staff in departments across the Agency. Whether answering technical questions, providing training, or giving people needed encouragement to finish the project, the project management team was critical to the Agency-wide deployment of open data.

Data Limitations: Each dataset has its limitations. For example, its granularity means that it might be useful for analysis at a county level, but not at a city level. Similarly, some existing datasets are easier to convert into a machine-readable format than others. For example, perhaps the department regularly publishes data in CSV format (which looks similar to Excel except without tabs and formatting), and has already written a robust description of the dataset. It may be ready to publish this data with little additional work. However, if the data are organized in a way a computer cannot read, then the department still needs to reformat the information.

The Health and Human Services Agency expected and received significant feedback from stakeholders. Some wanted more recent or granular data than previously published. Others found errors and inconsistencies in the data. The Agency found that the most effective way to address these issues was to implement a process by which it could take and act on feedback from the public. They also began small, starting with several small datasets from each department rather than all datasets from all departments. In effect, each department piloted their own datasets, received feedback, and, if required, modified how they published either their data or supporting information.

While the pilot program has only focused on publishing data that was already publicly available, stakeholders continue to request that the Agency publish additional data. Where the information is clearly public, the Agency will have an easy decision and will only need to prioritize which data to publish first. Other datasets require more analysis. Could someone re-identify personal information? Is there a sensitivity that does not exist in clearly innocuous datasets like “Most Popular Baby Names”? To answer these questions, an Agency team is currently developing a common set of data de-identification procedures for all departments to use.

Initial Impact

As of April 2016, over 80,000 users have visited the portal, and the portal is averaging over 7,000 unique visitors per month since May 2015. The open data project is also popular among journalists and civic technologists. During the 2014 measles outbreak, for example, journalists at the New York Times and Los Angeles Times used immunization data from the portal to report stories. Civic technologists, often volunteers providing their time and skills to making data more available to the public, have created several tools with the data (see WICit and AsthmaStoryCA.org, for example).

The Agency team did not launch the portal believing it would decrease Public Records Act requests, but some staff indicated that they have seen a drop in requests. Others said that now it was simply easier to respond to requests because some of the needed data was on the portal. In other cases, staff said they did not see a direct connection because many Public Record Act requests are for specialized datasets, including confidential information, that are unavailable on the portal.

Internally, the project has helped lead to a culture of data sharing and interoperability across departments. For instance, sharing data among departments can be difficult because the same entity or action can be coded differently by each department. Three departments collect data on hospitals, but they each refer to them differently in their data, making comparisons across datasets difficult. In response, a crosswalk was created so users – both inside and outside government – could analyze hospital facility data across different datasets. By understanding their data assets, departments are better supporting their missions. The open data project has created opportunities for departments to share data with each other and the public. This has resulted in internal and external collaboration and innovation. While the program is still young, initial results suggest that the effort by Agency staff to make their data more accessible has benefited internal and external users, and most importantly, the people of California who are served by the California Health and Human Services Agency.