Data Landscape Canvas
Why to use?
Explore the data landscape to identify essential and useful data sources for a data / AI product.
When to use?
Having identified relevant data sets for a data/AI product using the Data Monetization canvas, your next steps involve identifying, locating, categorizing, and checking the availability and quality of potential data sources using the Data Landscape canvas.
How to use?
I. Preparation
1. Fill the canvas header:
a) Label Focus on in the canvas header with a white sticky note with the name of the data/AI product. You can copy the Focus on sticky note from the Data Monetization canvas you were previously working on.
b) Footer: Add a legend with sticky notes in the corresponding color:
Green sticky notes: Data asset: data source
Yellow sticky notes: Data asset: data source (with issue)
Red sticky notes: Data gap
Blue sticky notes: Data/AI product or desired information
White sticky notes: Critical assumption or open question
II. Data Sourcing
② To direct your data exploration, add a blue sticky note with the name of the data/AI product or - for even tighter focus - the desired information in the central box Data Product.
③-⑥ Consider all the data necessary or useful for this data/AI product or the desired information. Utilize sticky notes in the following colors, or incorporate pre-existing ones from the Sort in box:
Green: Indicates that the data set is available from a data source. Label the green sticky note as follows: " data set: data source". If the data set is available in different versions, formats, sources etc., add multiple green sticky notes and add remarks to the labeling e.g., "data set: data source (remark)"
Red: Indicates that the data set is unavailable. Label the red sticky note with a descriptive title.
Yellow: Indicates that the data set is available but has issues related to accessibility, quality, privacy, security, etc. Label the yellow sticky note as follows: "data set: data source (issue)".
Place these sticky notes into one of the four data landscape quadrants, then discuss the implications:
③ Owned Data: Using owned data sets can provide a defendable competitive advantage as competitors may not be able to replicate the data. Owned data sets have no limitation in usage, so prioritize building your data/AI products on owned data.
④ Earned Data: Earned data, such as that from customers or suppliers, might have usage restrictions imposed by individual contracts or legal regulations (e.g., data privacy laws). If potential legal issues exist, change the sticky note from green to yellow to indicate this risk and name the issue.
⑤ Paid Data: Paid data often comes with stricter limitations, and exclusivity is usually not guaranteed. It's common for different departments within the same company to unknowingly purchase the same data set multiple times. To avoid such redundancy, always conduct a thorough review of the data and your existing data licensing agreements before finalizing any purchase.
⑥ Public Data: Public data typically lacks exclusivity, and many data sets come with restrictive usage rights, such as non-commercial use only. Notably, open data might be subject to "copyleft" agreements—using such data could require you to make your data products open source as well.
Also decide whether it is raw or derived data and think about the implications:
a) Raw data: Contains all original information but may be difficult to handle and use efficiently.
b) Derived data: This data has undergone processes like cleansing, normalization, aggregation, anonymization, etc., which results in the loss of some original information. Consider the implications of these changes on your data utility.
Depending on the legal and processing requirements, determine the most suitable data sources for your data/AI product. Place white sticky notes beside the selected data sets to document your decisions and reasons, and/or remove any unnecessary sticky notes.
III. Data Brainstorming (Optional)
In machine learning, quantity often enhances model performance. However, companies typically concentrate only on their data, overlooking the vast potential of public and paid data sources. To expand your data horizons, consider exploring additional data possibilities:
③ Owned Data: How can we modify our business model and/or processes to capture more or different types of data? Review the Business Model, Value Chain, Customer Touchpoints, and Analytics & AI Maturity (Business Operations box) canvases for insights.
④ Earned Data: What additional data could our customers and partners provide us with? Check the Business Model canvas to identify potential customers and partners and use (a copy of) the Business Model canvas to analyze the business model of your (B2B) customers or partners.
⑤ Paid Data: What external data could we purchase or exchange for our data? Use the Value Chain canvas to analyze the whole value chain of your industry and look for your customers' customers and / or suppliers' suppliers.
⑥ Public Data: What public data sources might contain relevant data? Think outside the box and be creative: are there any proxy variables? For instance, the number of edits on a Wikipedia page about a movie can strongly predict the movie’s revenue.
IV. Data Linking
The final and a critical step is to ensure that all the data sets link together. For example, identify all necessary identifiers or other linkage data (e.g., date, GPS coordinates, ZIP code etc.), locate the data source (e.g., master database system), assess its availability (green, yellow or red sticky note) place sticky notes in the corresponding Link Data boxes (③-⑥c), depending on the data sources origin.
Using dotted lines, connect each data source sticky note to the corresponding link data sticky note based on shared identifiers or linkage data. At the end, the entire graph should be fully connected to allow for comprehensive data integration and analysis.
Categories
Similar templates


