open data

Variations of Open

Posted on 2025-11-18 by Cercana

Introduction

The word “open” gets used so often in tech that it starts to feel universal, like everyone must be talking about the same thing. But once you listen closely, it becomes obvious that different groups mean very different things when they say it. A software engineer is thinking about readable source code and licenses. Someone who works with data is thinking about public portals and Creative Commons. People in AI might be picturing model weights you can download even if you can’t see how the model was trained. And increasingly, someone might just mean information that’s publicly visible online, such as social media posts, ship trackers, or livestreams, without any license at all.

None of these interpretations are wrong. They just grew out of different needs. Openness meant one thing when it applied to code, something else entirely when governments began releasing public data, and now it’s shifting again as AI models enter the mix. Meanwhile, the rise of OSINT has blurred things further, with “open” sometimes meaning nothing more than “accessible to anyone with an internet connection.”

The result is that modern systems combine pieces from all these traditions and people often assume they’re aligned when they’re not. The friction shows up not because anyone misunderstands the technology, but because the language hasn’t kept up with how quickly the idea of openness has expanded.

Open-Source Software

In terms of software, “open” means open-source. In that context it has a clear meaning. You can read the code, change it, and share it as long as you follow the license. That predictability is a big part of why the movement grew. People understood the rules and trusted that the rules would hold.

But the full spectrum of open-source shows up in the habits and culture around it. Communities develop their own rhythms for how to submit a pull request, file a useful bug report, talk through disagreements, and decide which features or fixes make it into a release. None of that comes from a license. People learn it by watching others work, answering questions in long issue threads, or showing up in mailing lists and channels where projects live.

There’s also an unspoken agreement inside open-source software. If a project helps you, you try to help it back. Maybe you fix a typo, or you donate, or you answer someone else’s question. It’s not required, but it’s how projects stay healthy.

Anyone who has maintained an open-source project knows it isn’t glamorous. It can be repetitive, sometimes thankless, and often demanding. Good maintainers end up juggling technical decisions, community management, and the occasional bit of diplomacy.

All this shapes a shared understanding of what openness means in software. People think not just about reading code, but about the whole ecosystem built around it: contribution paths, governance models, release practices, and the blend of freedom and responsibility that holds everything together.

Once the idea of openness moved beyond software, that ecosystem didn’t necessarily apply. As other fields developed their own approaches to openness, patterns and practices evolved in alignments with each unique domain.

Open data developed along its own path. Instead of code, publishers released information about the world: transit schedules, land use maps, environmental readings, census tables. The goal was simple: make public data easier to access so people could put it to use.

Because software licenses didn’t fit, data and content licenses such as Creative Commons were developed. CC BY and CC0 became common. Open Data Commons created specialized database licenses—ODbL added share-alike requirements specifically for databases, while PDDL offered a public domain dedication. You can see the differences in well known datasets. OpenStreetMap’s ODbL means derived data often has to stay open and always require attribution. USGS datasets, which are mostly public domain, are easy to fold into commercial tools. City transit feeds under CC BY only ask for attribution.

Privacy concerns complicate open data, which isn’t exempt from GDPR, CCPA, or similar laws. Even seemingly innocuous data can reveal personal information—location datasets showing frequent stops at specific addresses, or timestamped transit records that establish movement patterns. Many publishers address this through aggregation, anonymization, or by removing granular temporal and spatial details, but anyone working with open data still ends up checking metadata, tracking where files came from, and thinking about what patterns a dataset might reveal.

Open Source Information (OSINT)

Open-source Intelligence (OSINT) is an overlapping but different concept from open data. Information is considered “open” in OSINT because anyone can access it, not because anyone has the right to reuse it. A ship tracker, a social media post, a livestream from a city camera are examples of data that may fall into this category.

These sources vary widely in reliability. Some come from official databases or verified journalism. Others come from unvetted social media content, fast-moving crisis reporting, or user-generated material with no clear provenance. OSINT analysts rely heavily on validation techniques such as correlation, triangulation, consensus across multiple sources, and structured analytic methods.

While OSINT has deep roots in government intelligence work, it is now widely practiced across sectors including journalism, cybersecurity, disaster response, financial services, and competitive intelligence. Marketing technologies have expanded OSINT further into the private sector, making large-scale collection and analysis tools widely accessible.

Confusion can arise when open data and OSINT are treated as interchangeable. Someone may say they used open data, meaning a licensed dataset from a government portal. Someone else hears open and assumes it means scraping whatever is publicly visible.

This distinction matters because the two categories carry fundamentally different permissions and obligations. Open data comes with explicit rights to reuse, modify, and redistribute—legal clarity that enables innovation and collaboration. OSINT, by contrast, exists in a gray area where accessibility doesn’t imply permission, and users must navigate copyright, privacy laws, and terms of service on a case-by-case basis.

Understanding this difference isn’t just semantic precision; it shapes how organizations design data strategies, assess legal risks, and build ethical frameworks for information use. When practitioners clearly specify whether they’re working with licensed open data or publicly accessible OSINT, they help prevent costly misunderstandings and ensure their work rests on solid legal and ethical foundations.

Open AI Models

In AI, openness takes on another meaning entirely. A model is more than code or data. It’s architecture, training data, weights, and the training process that binds everything together. So when a model is described as open, it’s natural to ask which part is actually open.

You see the variety in projects released over the past few years. Some groups publish only the weights and keep the training data private. Meta’s Llama models fall into this category. You can download and fine tune them, but you don’t see what went into them. Others release architectural details and research papers without sharing trained weights—early transformer work from Google and OpenAI showed the approach without providing usable models. GPT-NeoX took a middle path, releasing both architecture and weights but with limited training data transparency.

A few projects aim for full transparency. BLOOM is the most visible example, with its openly released code, data sources, logs, and weights. It took a global effort to pull that off, and it remains the exception, though projects like Falcon and some smaller research models have attempted similar transparency.

This partial openness shapes how people use these models. If you only have the weights, you can run and fine tune the model, but you can’t inspect the underlying data. When the training corpus stays private, the risks and biases are harder to understand. And when licenses restrict use cases, as they do with Llama’s custom license that prohibits certain commercial applications, or research-only models like many academic releases, you might be able to experiment but not deploy. Mistral’s models show another approach—Apache 2.0 licensing for some releases but custom licenses for others.

The idea of contribution looks different too. You don’t patch a model the way you patch a library. You build adapters, write wrappers, create LoRA fine-tunes, or train new models inspired by what came before. Openness becomes less about modifying the original artifact and more about having the freedom to build around it.

So in AI, open has become a spectrum. Sometimes it means transparent. Sometimes it means accessible. Sometimes it means the weights are downloadable even if everything else is hidden. The word is familiar, but the details rarely match what openness means in software or data.

Real World Considerations

These differences are fairly straightforward when they live in their own domains. Complexity can arise when they meet inside real systems. Modern projects rarely stick to one tradition. A workflow might rely on an open-source library, an open dataset, publicly scraped OSINT, and a model with open weights, and each piece brings its own rules.

Teams can run into this without realizing it. Someone pulls in an Apache-licensed geospatial tool and combines it smoothly with CC BY data. These work fine together. But then someone else loads OpenStreetMap data without noticing the share-alike license that affects everything it touches. A third person adds web-scraped location data from social media, not considering the platform’s terms of service or privacy implications. A model checkpoint from Hugging Face gets added on top, even though the license limits commercial fine-tuning. Most of these combinations are manageable with proper documentation, but some create real legal barriers.

Expectations collide too. A software engineer assumes they can tweak anything they pull in. A data analyst assumes the dataset is stable and comes with clear reuse rights. An OSINT analyst assumes publicly visible means fair game for analysis. Someone working with models assumes fine-tuning is allowed. All reasonable assumptions inside their own worlds, but they don’t line up automatically.

The same thing happens in procurement. Leadership hears open and thinks it means lower cost or fewer restrictions. But an open-source library under Apache is not the same thing as a CC BY dataset, neither is the same as scraped public data that’s accessible but not licensed, and none of those are the same as an open-weight model with a noncommercial license.

Geospatial and AI workflows feel these tensions even more. They rarely live cleanly in one domain. You might preprocess imagery with open-source tools, mix in open government data, correlate it with ship tracking OSINT, and run everything through a model that’s open enough to test but not open enough to ship. Sometimes careful documentation and attribution solve the problem. Other times, you discover a share-alike clause or terms-of-service violation that requires rethinking the entire pipeline.

This is when teams slow down and start sorting things out. Not because anyone did something wrong, but because the word open did more work than it should have and because “publicly accessible” got mistaken for “openly licensed.”

Clarifying Open

A lot of this gets easier when teams slow down just enough to say what they actually mean by open. It sounds almost too simple, but it helps. Are we talking about open code, open data, open weights, open access research, or just information that’s publicly visible? Each one carries its own rules and expectations, and naming it upfront clears out a lot of the fog.

Most teams don’t need a formal checklist, though those in regulated industries, government contracting, or high-stakes commercial work often do. What every team needs is a little more curiosity about the parts they’re pulling in—and a lightweight way to record the answers. If someone says a dataset is open, ask under what license and note it in your README or project docs. If a model is open, check whether that means you can fine-tune it, use it commercially, or only experiment with it—and document which version you’re using, since terms can change. If a library is open-source, make sure everyone knows what the license allows in your context. If you’re using publicly visible information—social media posts, ship trackers, livestreams—be clear that this is OSINT, not licensed open data, and understand what legal ground you’re standing on.

These questions matter most at project boundaries: when you’re about to publish results, share with partners, or move from research to production. A quick decision log—even just a shared document listing what you’re using and under what terms—prevents expensive surprises. It also helps when someone new joins the team or when you revisit the project months later.

The more people get used to naming the specific flavor of openness they’re dealing with and writing it down somewhere searchable, the smoother everything else goes. Projects move faster when everyone shares the same assumptions. Compliance reviews become straightforward when the licensing story is already documented. Teams stop discovering deal-breakers right when they’re trying to ship something. It’s not about being overly cautious or building heavy process. It’s just about giving everyone a clear, recorded starting point before the real work begins.

Conclusion

If there’s a theme running through all of this, it’s that the word open has grown far beyond its original boundaries. It meant one thing in software, something different in the world of public data, another in AI, and gets stretched even further when people conflate it with simply being publicly accessible. Each tradition built its own norms and expectations, and none of them are wrong. They just don’t automatically line up the way people sometimes expect.

Most of the friction we see in real projects doesn’t come from bad decisions. It comes from people talking past one another while using the same word. A workflow can look straightforward on paper but fall apart once you realize each component brought its own version of openness, or that some parts aren’t “open” at all, just visible. By the time a team has to sort it out, they’ve already committed to choices they didn’t realize they were making.

The good news is that this is manageable. When people take a moment to say which kind of open they mean, or acknowledge when they’re actually talking about OSINT or other public information, everything downstream gets smoother: design, licensing, procurement, expectations, even the conversations themselves. It turns a fuzzy idea into something teams can actually work with. It requires ongoing attention, especially as projects grow and cross domains, but the effort pays off.

Openness is a powerful idea, maybe more powerful now than ever. But using it well means meeting it where it actually is, not where we assume it came from.

At Cercana Systems, we have deep experience with the full open stack and can help you navigate the complexities as you implement open assets in your organization. Contact us to learn more.

Header image credit: Aaron Pruzaniec, CC BY 2.0 https://creativecommons.org/licenses/by/2.0, via Wikimedia Commons

A Novice Takes a Stab at GIS – Part 3

Posted on 2025-05-29 by Cercana

At this point in my entry-level upskilling project, the ground work has been done. I have a polygon of the Chesapeake Bay laid over an OpenStreetMap layer and I know how to change the color of it. Going back to the initial post, my hope with this project is to show change over time in the crab population of the Bay. As a complete novice, I don’t even know if there’s a way for me to do that in QGIS, or if I’m going to make 15 different maps with the 15 years of data and turn images of them into a .gif. So, I went back to ChatGPT for guidance.

It also told me I could use style changes by time attribute, the TimeManager Plugin, or the manual process I had considered doing with turning a series of images into a .gif.

I’ll be using the Temporal Controller since it was the first option. I asked ChatGPT for a step-by-step guide of how to do this.

Before getting bogged down in the process of creating the visualization, it’s important to have my data prepped and ready to go. I asked ChatGPT how it needed to be set up in order to use the temporal controller function.

In this case, I’ve decided to not do the thing that ChatGPT says is easier. The “Join External Time Data to Polygon” option seems to involve more data preparation work and be a better process to know for future projects. I began by taking a screen capture of the data table from the Maryland DNR’s Winter Dredge Survey history, uploaded it into ChatGPT, and had it use its OCR capabilities to make a table that I could paste into Excel and save as a .csv.

Step 1.

Step 2.

Step 3.

After looking back at some of the steps in the process of using the temporal controller (Step 2 above), the final product ended up looking like this. I went into the attribute table of the polygon and saw that it already had an assigned ID of “2250”, so I added that column. Additionally, the geometry type is a polygon so that was added, as well.

With that, data preparation was complete and now I’m ready to move on to joining the data table to the polygon and creating the visualization.

A Novice Takes a Stab at GIS – Part Two

Posted on 2025-05-16 by Cercana

Last week, I was able to settle on what the map I was creating would illustrate and find trustworthy data to use. This week, the focus is on actually creating the map itself. To do this, I need shapefiles of the Chesapeake Bay Watershed.

I was able to source one from the Chesapeake Bay Program at data-chesbay.opendata.argis.com. This took me a handful of tries as most of the publicly available shape files of the Bay are a polygon of all the land and water considered to be within the Chesapeake Bay watershed. For the purposes of this map, I was looking for just the water itself.

As a reminder, this is a self-guided process where I’m using ChatGPT to guide me through learning how to use QGIS. I’ve never loaded a shapefile before and ChatGPT gave me clear instructions.

In order to load the shapefile into QGIS, I dragged the downloaded folder, which included .shp, .xml, .shx, .prj, .dbf, and .cpg files, into a blank new project. I felt a brief moment of triumph before realizing that getting the land surrounding the Bay into the project would likely not be as simple, but it was actually even easier.

QGIS has an OpenStreetMap layer built into the “XYZ Tiles” tab on the left side of the window. I turned it on, reordered the layers so that my shapefile of the water was over top of OSM, and that was all that needed to be done. The program had already lined up the shapefile of the Bay itself perfectly with where OSM had the Bay.

Now it’s time to go back to Professor ChatGPT. I need to know how to change the color of the shapefile before I can even worry about assigning different colors to different levels of crab population, finding out how to automatically change the color based on data in a table, or anything else.

Just to practice, I made the Bay crimson.

Step 1.

Step 2.

Step 3.

In my next post, I’ll be going back to ChatGPT to learn how I can set up a table of data and instruct QGIS to change the color of the water based on the data in said table. I’m not sure how that will work or look yet, but that’s part of the learning.

Reflections on the Process of Planning FedGeoDay 2025

Posted on 2025-04-28 by Cercana

What is FedGeoDay?

FedGeoDay is a single-track conference dedicated to federal use-cases of open geospatial ecosystems. The open ecosystems have a wide variety of uses and forms, but largely include anything designed around open data, open source software, and open standards. The main event is a one day commitment and is followed by a day of optional hands-on workshops.

FedGeoDay has existed for roughly a decade , serving as a day of learning, networking, and collaboration in the Washington, D.C. area. Recently, Cercana Systems president Bill Dollins was invited to join the planning committee, and served as one of the co-chairs for FedGeoDay 2024 and 2025. His hope is that attendees are able to come away with practical examples of how to effectively use open geospatial ecosystems in their jobs.

Photo courtesy of OpenStreetMap US on LinkedIn.

“Sometimes the discussion around those concepts can be highly technical and even a little esoteric, and that’s not necessarily helpful for someone who’s just got a day job that revolves around solving a problem. Events like this are very helpful in showing practical ways that open software and open data can be used.”

Dollins joined the committee for a multitude of reasons. In this post, we will explore some of his reasons for joining, as well as what he thinks he brings to the table in planning the event and things he has learned from the process.

Why did you join the committee?

When asked for some of the reasons why he joined the planning committee for FedGeoDay, Dollins indicated that his primary purpose was to give back to a community that has been very helpful and valuable to him throughout his career in a very hands-on way.

“In my business, I derive a lot of value from open-source software. I use it a lot in the solutions I deliver in my consulting, and when you’re using open-source software you should find a way that works for you to give back to the community that developed it. That can come in a number of ways. That can be contributing code back to the projects that you use to make them better. You can develop documentation for it, you can provide funding, or you can provide education, advocacy, and outreach. Those last three components are a big part of what FedGeoDay does.”

He also says that while being a co-chair of such an impactful event helps him maintain visibility in the community, getting the opportunity to keep his team working skills fresh was important to him, too.

“For me, also, I’m self-employed. Essentially, I am my team,” said Dollins. “It can be really easy to sit at your desk and deliver things and sort of lose those skills.”

What do you think you brought to the committee?

Dollins has had a long career in the geospatial field and has spent the majority of his time in leadership positions, so he was confident in his ability to contribute in this new form of leadership role. Event planning is a beast of its own, but early on in the more junior roles of his career, the senior leadership around him went out of their way to teach him about project cost management, staffing, and planning agendas. He then was able to take those skills into a partner role at a small contracting firm where he wore every hat he could fit on his head for the next 15 years, including still doing a lot of technical and development work. Following his time there, he had the opportunity to join the C-suite of a private sector SaaS company and was there for six years, really rounding out his leadership experience.

He felt one thing he was lacking in was experience in community engagement, and event planning is a great way to develop those skills.

“Luckily, there’s a core group of people who have been planning and organizing these events for several years. They’re generally always happy to get additional help and they’re really encouraging and really patient in showing you the rules of the road, so that’s been beneficial, but my core skills around leadership were what applied most directly. It also didn’t hurt that I’ve worked with geospatial technology for over 30 years and open-source geospatial technology for almost 20, so I understood the community these events serve and the technology they are centered around,” said Dollins.

Photo courtesy of Ran Goldblatt on LinkedIn.

What were some of the hard decisions that had to be made?

Photo Courtesy of Cercana Systems on LinkedIn.

Attendees of FedGeoDay in previous years will likely remember that, in the past, the event has always been free for feds to attend. The planning committee, upon examining the revenue sheets from last year’s event, noted that the single largest unaccounted for cost was the free luncheon. A post-event survey was sent out, and federal attendees largely indicated that they would not take issue with contributing $20 to cover the cost of lunch. However, the landscape of the community changed in a manner most people did not see coming.

“We made the decision last year, and keep in mind the tickets went on sale before the change of administration, so at the time we made the decision last year it looked like a pretty low-risk thing to do,” said Dollins.

Dollins continued to say that while the landscape changes any time the administration changes, even without changing parties in power, this one has been a particularly jarring change.

“There’s probably a case to be made that we could have bumped up the cost of some of the sponsorships and possibly the industry tickets a little bit and made an attempt to close the gap that way. We’ll have to see what the numbers look like at the end. The most obvious variable cost was the cost of lunches against the free tickets, so it made sense to do last year and we’ll just have to look and see how the numbers play out this year.”**

What have you taken away from this experience?

Dollins says one of the biggest takeaways from the process of helping to plan FedGeoDay has been learning to apply leadership in a different context. Throughout most of his career, he has served as a leader in more traditional team structures with a clearly defined hierarchy and specified roles. When working with a team of volunteers that have their own day jobs to be primarily concerned with, it requires a different approach.

“Everyone’s got a point of view, everyone’s a professional and generally a peer of yours, and so there’s a lot more dialogue. The other aspect is that it also means everyone else has a day job, so sometimes there’s an important meeting and the one person that you needed to be there couldn’t do it because of that. You have to be able to be a lot more asynchronous in the way you do these things. That’s a good thing to give you a different approach to leadership and team work,” said Dollins on the growth opportunity.

Dollins has even picked up some new work from his efforts on the planning committee by virtue of getting to work and network with people that weren’t necessarily in his circle beforehand. Though he’s worked in the geospatial field for 30 years and focused heavily on open-source work for 20, he says he felt hidden away from the community in a sense during his time in the private sector.

Photo courtesy of Lane Goodman on LinkedIn.

“This has helped me get back circulating in the community and to be perceived in a different way. In my previous iterations, I was seen mainly from a technical perspective, and so this has kind of helped me let the community see me in a different capacity, which I think has been beneficial.”

FedGeoDay 2025 has concluded and was a huge success for all involved. Cercana Systems looks forward to continuing to sponsor the event going forward, and Dollins looks forward to continuing to help this impactful event bring the community together in the future.

Photo courtesy of Cercana Systems on LinkedIn.

**This interview was conducted before FedGeoDay 2025 took place. The event exceeded the attendance levels of FedGeoDay 2024.

FedGeoDay 2025 Highlights

Posted on 2025-04-23 by Cercana

The Cercana Systems team had a wonderful time attending FedGeoDay 2025 in Washington, D.C.! It was fun to catch up with long-time colleagues, make new professional connections, and learn how a wide array of new projects are contributing to the ever-evolving world of open geospatial ecosystems.

A standout highlight was the in-depth keynote by Katie Picchione of NASA’s Disasters Program on the critical role played by open geospatial data in disaster response. Additionally, Ryan Burley of GeoSolutions moderated an excellent panel on Open-Source Geospatial Applications for Resilience, and Eddie Pickle of Crunchy Data led an energetic panel on Open Data for Resilience.

We were especially excited about the “Demystifying AI” panel with panelists Emily Kalda of RGi, Jason Gilman of Element 84, Ran Goldblatt of New Light Technologies, and Jackie Kazil of Bana Solutions which was moderated by Cercana’s president Bill Dollins.

Location is an increasingly important component of cybersecurity and FedGeoDay featured a fireside chat on cybersecurity led by Ashley Fairman of DICE Cyber. On either side of the lunch break, Wayne Hawkins of RGi moderated a series of informative lightning talks on a range of topics.

FedGeoDay was a content-rich event that was upbeat from beginning to end. We are grateful to all of the presenters and panelists for taking the time to share their knowledge and to the organizing committee for their work in pulling together such a high-quality event. Cercana is proud to support FedGeoDay and looks forward to continuing to do so for years to come.