Getting back into the swing of meetups again, last night I went to the MTLData meetup – a group of data scientists and enthusiasts who are looking to raise the profile of data science in Montreal. The event featured a panel discussion on the topic of ‘Build vs Buy?’ when considering software for data solutions.
The panellists were Marc-Antoine Ross, Director of Data Engineering at Intel Security, Maxime Leroux, consulting data scientist at Keyrus Canada, and Jeremy Barnes, Chief Architect at Element AI. The chair was Vaughan DiMarco of Vonalytics.
Data as liquid
The issues were very familiar to me from considering EDRM and DAM systems, which made me think about the way data has changed as an asset, and how management and security of data now has to include the ‘liquid’ nature of data as an asset. This adds another layer of complexity. Data still needs to be archived as a ‘record’ for many reasons (regulatory compliance, business continuity, archival value…) but for a data-driven organisation, the days of rolling back to ‘yesterday’s version of the database’ seem like ancient history. Data assets are also complex in that they are subject to many levels of continuous processing, so the software that manages the processing also has to be robust.
The metaphor of data flowing around the organisation like water seems especially telling. If there is a system failure, you can’t necessarily just turn off the tap of data, and so your contingency plans need to include some kind of ‘emergency reservoir’ so that data that can’t be processed immediately does not get lost and the flow can be re-established easily.
Build vs Buy?
The issues highlighted by the panel included costs – available budget, restrictions from finance departments, balance between in-house and outsourced spending (again all familiar in EDRM and DAM procurement), privacy, security, ability to maintain a system, and availability of skills. Essentially balancing risks, which will be unique to each team and each business. In terms of deciding whether to build something in house, availability of in house resource is an obvious consideration, but Marc-Antoine stressed the importance of thinking through what added value a bespoke build could offer, as opposed to other ways the team could be spending their time. For example, if there are no off-the-shelf or open source products that match requirements, if there is value in owning the IP of a new product, if risks can be kept low, and resources are available, a build might be worthwhile.
There are risks associated with all three of the main options – a big vendor is less likely to go bust, but sometimes they can be acquired, sometimes they can stop supporting a product or particular features, and they can be very costly. Open source has the advantage of being free, but relies on ad hoc communities to maintain and update the code base, and how vibrant and responsive each specific community is, or will remain, can vary. Open source can be a good option for low risk projects – such as proof-of-concept, or for risk tolerant startups with plenty of in-house expertise to handle the open source code themselves.
The conversation diverged into a discussion of the future of AI, which everyone seemed to agree was going to become a standard tool for most businesses eventually. Jeremy noted that AI at the moment is being sought after for its scarcity value, to give early adopters an edge over the competition, while Maxime suggested that early advantage is likely to fade, just as it has with data science. Data analysis is now so ubiquitous, even small businesses are involved to a certain extent. Jeremy pointed out that it is hard to maintain a competitive edge based on the scarcity of data itself, as data can so easily be copied and distributed, but knowing how to make intelligent use of the data is a scarce commodity. Making connections and managing data in a very tailored specific way could even be a way for organisations to compete with Google, who have more data than anyone else, but are not necessarily able to answer all questions or have the most useful insights into specific problems.
The value of meaning
I was intrigued by this, as it validates the role of semantics – data without meaning is useless – and the importance of the imaginative and creative leaps that humans can make, as well as the moral and social reasoning that humans can bring. With reports of early AI systems reflecting existing biases and prejudices, and with disasters like the SimSimi chatbot causing social problems such as bullying amongst youngsters, the need for a real human heart to accompany artificial intelligence seems ever more important.
Scarcity of understanding?
Someone asked if the panel thought companies would soon need