machine learning

Patrick Ball, Director of Research, Human Rights Data Analysis Group, offered examples of when statistics and machine learning have proved useful and when they’ve failed in this presentation from Open Source Summit Europe.

Machine learning and statistics are playing a pivotal role in finding the truth in human rights cases around the world – and serving as a voice for victims, Patrick Ball, director of Research for the Human Rights Data Analysis Group, told the audience at Open Source Summit Europe.

Ball began his keynote, “Digital Echoes: Understanding Mass Violence with Data and Statistics,” with background on his career, which started in 1991 in El Salvador, building databases. While working with truth commissions from El Salvador to South Africa to East Timor, with international criminal tribunals as well as local groups searching for lost family members, he said, “one of the things that we work with every single time is trying to figure out what the truth means.”

In the course of the work, “we’re always facing people who apologize for mass violence. They tell us grotesque lies that they use to attempt to excuse this violence. They deny that it happened. They blame the victims. This is common, of course, in our world today.”

Human rights campaigns “speak with the moral voice of the victims,’’ he said. Therefore, it is critical that statistics, including machine learning, are accurate, Ball said.

He gave three examples of when statistics and machine learning proved to be useful, and where they failed.

Finding missing prisoners

In the first example, Ball recalled his participation as an expert witness in the trial of a war criminal, the former president of Chad, Hissène Habré. Thousands of documents were presented, which had been discovered as a pile of trash in an abandoned prison and which turned out to be the operational records of the secret police.

The team honed in one type of document that detailed the number of prisoners that were held at the beginning of the day, the number held at the end of the day, and the difference between the number of prisoners who were released, new prisoners brought in, those transferred to other places, and those who had died during the course of the day. Dividing the number of people who died throughout the day by the number alive in the morning produces the crude mortality rate, he said.

The status of the prisoners of war was critical in the trial of Habré because the crude mortality rate was “extraordinarily high,” he said.

“What we’re doing in human rights data analysis is … trying to push back on apologies for mass violence. In fact, the judges in the [Chad] case saw precisely that usage and cited our evidence … to reject President Habré’s defense that conditions in the prison were nothing extraordinary.”

That’s a win, Ball stressed, since human rights advocates don’t see many wins, and the former head of state was sentenced to spend the rest of his life in prison.

Hidden graves in Mexico

In a more current case, the goal is to find hidden graves in Mexico of the bodies of people who have disappeared after being kidnapped and then murdered. Ball said they are using a machine learning model to predict where searchers are likely to find those graves in order to focus and prioritize searches.

Since they have a lot of information, his team decided to randomly split the cases into test and training sets and then train a model. “We’ll predict the test data and then we’ll iterate that split, train, test process 1,000 times,’’ he explained. “What we’ll find is that over the course of four years that we’ve been looking at, more than a third of the time we can perfectly predict the counties that have graves.”

“Machine learning models are really good at predicting things that are like the things they were trained on,” Ball said.

A machine learning model can visualize the probability of finding mass graves by county, which generates press attention and helps with the advocacy campaign to bring state authorities into the search process, he said.

That’s machine learning, contributing positively to society,” he said. Yet, that doesn’t mean that machine learning is necessarily positive for society as a whole.

Predictive Policing

Many machine learning applications “are terribly detrimental to human rights and society,’’ Ball stressed.  In his final example, he talked about predictive policing, which is the use of machine learning patterns to predict where crime is going to occur.

For example, Ball and his team looked at drug crimes in Oakland, California. He displayed a heat map of the density of drug use in Oakland, based on a public health survey, showing the highest drug use close to the University of California.

Ball and his colleagues re-implemented one of the most popular predictive policing algorithms to predict crimes based on this data. Then he showed the model running in animation, with dots on the grid representing drug arrests. Then the model made predictions in precisely the same locations as where the arrests were observed, he said.

If the underlying data turns out to be biased, then “we recycle that bias. Now, biased data leads to biased predictions.” Ball went on to clarify that he was using the term bias in a technical, not racial sense.

When bias in data occurs, he said, it “means that we’re over predicting one thing and that we’re under predicting something else. In fact, what we’re under predicting here is white crime,’’ he said. Then the machine learning model teaches police dispatchers that they should go to the places they went before. “It assumes the future is like the past,” he said.

“Machine learning in this context does not simply recycle racial disparities in policing, [it] amplifies the racial disparities in policing.” This, Ball said, “is catastrophic. Policing already facing a crisis of legitimacy in the United States as a consequence of decades, or some might argue centuries, of unfair policing. ML makes it worse.”

“In predictive policing, a false positive means that a neighborhood can be systematically over policed, contributing to the perception of the citizens in that neighborhood that they’re being harassed. That erodes trust between the police and the community. Furthermore, a false negative means that police may fail to respond quickly to real crime,” he said.

When machine learning gets it wrong

Machine learning models produce variances and random errors, Ball said, but bias is a bigger problem. “If we have data that is unrepresentative of a population to which we intend to apply the model, the model is unlikely to be correct. It is likely to reproduce whatever that bias is in the input side.”

We want to know where a crime has occurred, “but our pattern of observation is systematically distorted. It’s not that [we] simply under-observe the crime, but under-observe some crime at a much greater rate than other crimes.” In the United States, he said, that tends to be distributed by race. Biased models are the end result of that.

The cost of a machine learning being wrong can also destroy people’s lives, Ball said. It also raises the question of who bears the cost of being wrong. You can hear more from Ball and learn more about his work in the complete video presentation below.

Hilary Mason, general manager for machine learning at Cloudera, discussed AI in the real world in her keynote the recent Open FinTech Forum.

We are living in the future – it is just unevenly distributed with “an outstanding amount of hype and this anthropomorphization of what [AI] technology can actually provide for us,” observed Hilary Mason, general manager for machine learning at Cloudera, who led a keynote on “AI in the Real World: Today and Tomorrow,” at the recent Open FinTech Forum.

AI has existed as an academic field of research since the mid-1950s, and if the forum had been held 10 years ago, we would have been talking about big data, she said. But, today, we have machine learning and feedback loops that allow systems continue to improve with the introduction of more data.

Machine learning provides a set of techniques that fall under the broad umbrella of data science. AI has returned, from a terminology perspective, Mason said, because of the rise of deep learning, a subset of machine learning techniques based around neural networks that has provided not just more efficient capabilities but the ability to do things we couldn’t do at all five years ago.

Imagine the future

All of this “creates a technical foundation on which we can start to imagine the future,’’ she said. Her favorite machine learning application is Google Maps. Google is getting real-time data from people’s smartphones, then it is integrating that data with public data sets, so the app can make predictions based on historical data, she noted.

Getting this right, however, is really hard. Mason shared an anecdote about how her name is a “machine learning-edge case.” She shares her name with a British actress who passed away around 2005 after a very successful career.

Late in her career, the actress played the role of a ugly witch, and a search engine from 2009 combined photos with text results. At the time, Mason was working as a professor, and her bio was paired with the actress’s picture in that role. “Here she is, the ugly hag… and the implication here is obvious,’’ Mason said. “This named entity disambiguation problem is still a problem for us in machine learning in every domain.”

This example illustrates that “this technology has a tremendous amount of potential to make our lives more efficient, to build new products. But it also has limitations, and when we have conferences like this, we tend to talk about the potential, but not about the limitations, and not about where things tend to go a bit wrong.”

Machine learning in FinTech

Large companies operating complex businesses have a huge amount of human and technical expertise on where the ROI in machine learning would be, she said. That’s because they also have huge amounts of data, generally created as a result of operating those businesses for some time. Mason’s rule of thumb when she works with companies, is to find some clear ROI on a cost savings or process improvement using machine learning.

“Lots of people, in FinTech especially, want to start in security, anti-money laundering, and fraud detection. These are really fruitful areas because a small percentage improvement is very high impact.”

Other areas where machine learning can be useful is in understanding your customers, churn analysis and marketing techniques, all of which are pretty easy to get started in, she said.

“But if you only think about the ROI in the terms of cost reduction, you put a boundary on the amount of potential your use of AI will have. Think also about new revenue opportunities, new growth opportunities that can come out of the same technologies. That’s where the real potential is.”

Getting started

The first thing to do, she said is to “drink coffee, have ideas.” Mason said she visits lots of companies and when she sees their list of projects, they’re always good ideas. “I get very worried, because you are missing out on a huge amount of opportunity that would likely look like bad ideas on the surface.”

It’s important to “validate against robust criteria” and create a broad sweep of ideas. Then, go through and validate capabilities. Some of the questions to ask include: is there research activity relevant to what you’re doing? Is there work in one domain you can transfer to another domain? Has somebody done something in another industry that you can use or in an academic context that you can use?

Organizations also need to figure out whether systems are becoming commoditized in open source; meaning “you have a robust software and infrastructure you can build on without having to own and create it yourself.” Then, the organization must figure out if data is available — either within the company or available to purchase.

Then it’s time to “progressively explore the risky capabilities. That means have a phased investment plan,’’ Mason explained. In machine learning, this is done in three phases, starting with validation and exploration: Does the data exist? Can you build a very simple model in a week?

“At each [phase], you have a cost gate to make sure you’re not investing in things that aren’t ready and to make sure that your people are happy, making progress, and not going down little rabbit holes that are technically interesting, but ultimately not tied to the application.”

That said, Mason said predicting the future is of course, very hard, so people write reports on different technologies that are designed to be six months to two years ahead of what they would put in production.

Looking ahead

As progress is made in the development of AI, machine learning and deep learning, there are still things we need to keep in mind, Mason said. “One of the biggest topics in our field right now is how we incorporate ethics, how we comply with expectations of privacy in the practice of data science.”

She gave a plug to a short, free ebook called “Data Driven: Creating a Data Culture,” that she co-authored with DJ Patil, who worked as chief data scientist for President Barack Obama. Their goal, she said, is “to try and get folks who are practicing out in the world of machine learning and data science to think about their tools [and] for them to practice ethics in the context of their work.”

Mason ended her presentation on an optimistic note, observing that “AI will find its way into many fundamental processes of the businesses that we all run. So when I say, ‘Let’s make it boring,’ I actually think that’s what makes it more exciting.’”

You can watch the complete presentation below:

Enterprise open source adoption has its own set of challenges, but it becomes easier if you have a clear plan to follow. At Open FinTech Forum, Ibrahim Haddad provides guidelines based on proven practices.

2018 marks the year that open source disrupts yet another industry, and this time it’s financial services. The first-ever Open FinTech Forum, happening October 10-11 in New York City, focuses on the intersection of financial services and open source. It promises to provide attendees with guidance on building internal open source programs along with an in-depth look at cutting-edge technologies being deployed in the financial sector, such as AI, blockchain/distributed ledger, and Kubernetes.

Several factors make Open FinTech Forum special, but the in-depth sessions on day 1 especially stand out. The first day offers five technical tutorials, as well as four working discussions covering open source in an enterprise environment, setting up an open source program office, ensuring license compliance, and best practices for contributing to open source projects.

Enterprise open source adoption has its own set of challenges, but it becomes easier if you have a clear plan to follow. At Open FinTech, I’ll present a tutorial session called “Using Open Source: An Enterprise Guide,” which provides a detailed discussion on how to use open source. We’ll start by answering the question, “Why Open Source,” then discuss how to build an internal supporting infrastructure and look at some lessons learned from over two decades of enterprise open source experience. This session — run under the Chatham House Rule — offers a workshop-style environment that is a mix of presentation and discussion triggered by audience questions. The workshop is divided into five sections, explored below.

Why Open Source?

This question may seem trivial but it’s a very important consideration that even the most open source mature companies revisit regularly. In this part of the workshop, we’ll examine seven key reasons why enterprises should engage with open source software, regardless of industry and focus, and how they can gain incredible value from such engagements.

The Importance of Open Source Strategy

Going through the exercise of establishing an open source strategy is a great way to figure out your company’s current position and its future goals with respect to open source. These strategy discussions will usually evolve around goals you’d like to achieve, along with why and how you’d like to achieve them. In this part of the tutorial, we discuss the many questions to consider when determining your open source strategy and tie that to your product and services strategy for a path to a better ROI.

Implementing an Open Source infrastructure

Once you have identified your company’s open source strategy, you need to build infrastructure to support your open source efforts and investments. That infrastructure should act as a enabler for your efforts in using open source, complying with license, contributing to projects, and leading initiatives. In the workshop, I’ll present these various elements that together form an incredible enabling environment for your open source efforts.

Recommended Practices (17 of them)

When IBM pledged to spend $1 billion on Linux R&D back in 2000, it was a major milestone. IBM was a pioneer in the enterprise open source world, and the company had to learn a lot about working with open source software and the various communities. Other companies have since followed suit, and many more are now entering open source as it becomes the new normal of software development.  The question is: How can you minimize the enterprise learning curve on your own open source journey? We’ve got you covered. In this talk, we’ll explore 17 lessons learned from nearly two decades of enterprise experience with open source software.


Beyond implementing these best practices, open source adoption requires a cultural shift from traditional software development practices to a more open and collaborative mindset. Internal company dynamics need to be favorable to open source efforts. As an open source leader inside your organization, you will face several challenges in terms of funding resources, justifying ROI, getting upstream focus, etc. These challenges often require a major shift in mindset and a lot of education up the chain. We will explore various considerations relating to culture, processes, tools, continuity, and education to ensure you are on track to open source success in your organization.

We hope to see you at Open FinTech Forum for an informative and high-value event.

Sign up to receive updates on Open FinTech Forum:

Don’t miss Open FinTech Forum, October 10 and 11 in New York.

Join Open FinTech Forum: AI, Blockchain & Kubernetes on Wall Street next month to learn:

  • How to build internal open source programs
  • How to leverage cutting-edge open source technologies to drive efficiencies and flexibility

Blockchain Track:

Hear about the latest distributed ledger deployments, use cases, trends, and predictions of blockchain adoption. Session highlights include:

  • Panel Discussion: Distributed Ledger Technology Deployments & Use Cases in Financial Services – Jesse Chenard, MonetaGo; Umar Farooq, JP Morgan; Julio Faura, Santander Bank; Hanna Zubko, IntellectEU; Robert Hackett, Fortune Magazine
  • Enterprise Blockchain Adoption – Trends and Predictions – Saurabh Gupta, HfS Research
  • Blockchain Based Compliance Management System – Ashish Jadhav, Reliance Jio Infocomm Limited

Artificial Intelligence Track:

See how financial institutions are increasingly using AI and machine learning in a range of applications across the financial system including fraud detection, DDoS mitigation, marketing and usage pattern analysis. Session highlights include:  

  • Build Intelligent Applications with Azure Cognitive Service and CNTK – Bhakthi Liyanage, Bank of America
  • Will HAL Open the Pod Bay Doors? An (Enterprise FI) Decisioning Platform Leveraging Machine Learning – Sumit Daryani & Niraj Tank, Capital One
  • Using Text Mining and Machine Learning to Enhance the Credit Risk Assessment Process – Bruce Brenkus, Spotcap

Cloud Native & Kubernetes Track:

Learn how Kubernetes and other cloud native applications help provide integration and automation between development and deployment for platform or infrastructure as code. Session highlights include:

  • Panel Discussion: Real-World Kubernetes Use Cases in Financial Services: Lessons Learned from Capital One, BlackRock and Bloomberg – Steven Bower, Bloomberg; Michael Francis, BlackRock; Jeffrey Odom, Capital One; Paris Pittman, Google; Ron Miller, TechCrunch
  • Multi-tenancy and Tenant Isolation on Kubernetes – Michael Knapp & Andrew Gao, Capital One
  • Building a Banking Platform on Open Source & Containers to Achieve a Cloud Native Platform – Jason Poley, Barclays

Open FinTech Forum also offers deep dive sessions on building internal open source programs (governance, compliance, establishing an open source program office, contributing and more) as well as tutorials on blockchain, containers and cloud native.

Whether you are already using open source, or just getting started, Open FinTech Forum offers learnings, insights and connections that can help inform IT decision makers about the open technologies driving digital transformation and how to best utilize them.

Sign up to receive updates on Open FinTech Forum: 


Secure your spot now.


Keynotes announced for Open FinTech Forum, coming up October 10-11 in New York.

Announcing the initial lineup of financial services leaders keynoting at Open FinTech Forum!

Keynote Speakers Include:

  • Brian Behlendorf, Executive Director, Hyperledger
  • Sally Eaves, Chief Technology Officer, Strategic Adviser and Member of the Forbes Technology Council
  • Yuri Litvinovich, Senior Cloud Engineer, Scotiabank
  • Hilary Mason, General Manager of Machine Learning, Cloudera
  • Rob Palatnick, Managing Director and Chief Technology Architect, DTCC
  • Bob Sutor, Vice President for IBM Q Strategy and Ecosystem, IBM Research

Focusing on the intersection of financial services and open source, Open FinTech Forum will provide CIOs and senior technologists guidance on building internal open source programs and an in-depth look at cutting-edge open source technologies including AI, blockchain/distributed ledger and Cloud Native/Kubernetes that can be leveraged to drive efficiencies and flexibility.

The full event agenda will be announced on August 23.

Sign up to receive updates on Open FinTech Forum:

Secure your spot now.


Linux Foundation members and LF project members receive a 20% discount on registration pricing. FinTech CIOs and senior technologists may receive a 50% discount on registration fees.

Email for discount codes.

deep learning

The LF Deep Learning Foundation is now accepting proposals for the contribution of projects.

I am very pleased to announce that the LF Deep Learning Foundation has approved a project lifecycle and contribution process to enable the contribution, support and growth of artificial intelligence, machine learning and deep learning open source projects. With these documents in place, the LF Deep Learning Foundation is now accepting proposals for the contribution of projects.

The LF Deep Learning Foundation, a community umbrella project of The Linux Foundation with the mission of supporting artificial intelligence, machine learning and deep learning open source projects, is working to build a self-sustaining ecosystem of projects.  Having a clear roadmap for how to contribute projects is a first step. Contributed projects operate under their own technical governance with collaboration resources allocated and provided by the LF Deep Learning Foundation’s Governing Board. Membership in the LF Deep Learning Foundation is not required to propose a project contribution.

The project lifecycle and contribution process documents can be found here: Note that sign-up to the general LF Deep Learning Foundation mailing list is required to access these materials.

If you are interested in contributing a project, please review the steps and requirements described in the above materials. We are very excited to see what kinds of innovative, forward-thinking projects the community creates.

If you have any questions on how to contribute a project or the types of support LF Deep Learning Foundation is providing to its projects, please reach out to me at

For more information on the LF Deep Learning Foundation, please visit

Acumos AI Challenge

The Acumos AI Challenge, presented by AT&T and Tech Mahindra, is an open source developer competition seeking innovative, ground-breaking AI solutions; enter now.

Artificial Intelligence (AI) has quickly evolved over the past few years and is changing the way we interact with the world around us. From digital assistants, to AI apps interpreting MRIs and operating self-driving cars, there has been significant momentum and interest in the potential for machine learning technologies applied to AI.

The Acumos AI Challenge, presented by AT&T and Tech Mahindra, is an open source developer competition seeking innovative, ground-breaking AI solutions from students, developers, and data scientists. We are awarding over $100,000 in prizes, including the chance for finalists to travel to San Francisco to pitch their solutions during the finals on September 11, 2018. Finalists will also have the chance to have their solutions featured in the Acumos Marketplace, exposure, and meetings with AT&T and Tech Mahindra executives.

Acumos AI is a platform and open source framework that makes it easy to build, share, and deploy AI applications. The Acumos AI platform, hosted by The Linux Foundation, simplifies development and provides a marketplace for accessing, using and enhancing AI apps.  

We created the Acumos AI Challenge to enable and accelerate AI adoption and innovation, while recognizing developers who are paving the future of AI development. The Acumos AI Challenge seeks innovative AI models across all use cases. Some example use cases include, but are not limited to:

5G & SDN

Build an AI app that improves the overall performance and efficiencies of 5G networks and Software-Defined Networking.

Media & Entertainment

Build an AI model targeting a media or entertainment use case. Examples include solutions for:

  • Broadcast media, internet, film, social media, and ad campaign analysis
  • Video and image recognition, speech and sound recognition, video insight tools, etc.


Build an AI app around network security use cases such as advanced threat protection, cyber security, IoT security, and more.

Enterprise Solutions

Build an AI model targeting an enterprise use case, including solutions for Automotive, Home Automation, Infrastructure, and IoT.

Since it is so easy to onboard new models into Acumos, there are nearly an infinite number of use cases to consider that can benefit consumers and businesses across a multitude of disciplines. When submitting your entry, we encourage you to consider all scenarios that you are passionate about.

The Acumos AI Challenge will be accepting submissions between May 31 – August 5, 2018. Teams are required to submit a working AI model, test dataset, and a demo video under 3 minutes. Register your team for the Challenge beginning May 31, 2018. We encourage you to register early so that you can begin to plan and build your solution and create your demo video.

Prize Packages

Register today and submit your AI solution for a chance to be one of the top three teams to pitch their app at the Palace of Fine Arts in San Francisco on September 11, 2018. The top three teams will each receive:

  • $25,000 Cash
  • Trip to the finals in San Francisco, including air and hotel (for two team members)
  • Meetings with AT&T and Tech Mahindra executives
  • AI Solution featured in Acumos Marketplace

The team that wins the finale will take home an additional $25,000 grand prize, for a total of $50,000.

We look forward to your entry and hope to see you in San Francisco in September!


open source AI

Download this new ebook to learn about some of the most successful open source AI projects.

Open source AI is flourishing, with companies developing and open sourcing new AI and machine learning tools at a rapid pace. To help you keep up with the changes and stay informed about the latest projects, The Linux Foundation has published a free ebook by Ibrahim Haddad examining popular open source AI projects, including Acumos AI, Apache Spark, Caffe, TensorFlow, and others.

“It is increasingly common to see AI as open source projects,” Haddad said. And, “as with any technology where talent premiums are high, the network effects of open source are very strong.”

Open Source AI: Projects, Insights, and Trends looks at 16 open source AI projects – providing in depth information on their histories, codebases, and GitHub contributions. In this 100+ page book, you’ll gain insights about the various projects as well as the state of open source AI in general. Additionally, the book discusses the importance of project incubators, community governance, project consolidation, and presents some observations on common characteristics among the surveyed projects.

For each of the projects examined, the book provides a detailed summary offering basic information, observations, and pointers to web and code resources.  If you’re involved with open source AI, this book provides an essential guide to the current state of open source AI.

Download the ebook now to learn more about the most successful open source AI projects and read what it takes to build your own successful community.

open source AI

We look at three open source AI projects aimed at simplifying access to AI tools and insights.

At the intersection of open source and artificial intelligence, innovation is flourishing, and companies ranging from Google to Facebook to IBM are open sourcing AI and machine learning tools.

According to research from IT Intelligence Markets, the global artificial intelligence software market is expected to reach 13.89 billion USD by the end of 2022. However, talk about AI has accelerated faster than actual deployments. According to a detailed McKinsey report on the growing impact of AI, “only about 20 percent of AI-aware companies are currently using one or more of its technologies in a core business process or at scale.” Here, we look at three open source AI projects aimed at simplifying access to AI tools and insights.


Google has open sourced a software framework called TensorFlow that it spent years developing to support its AI software and other predictive and analytics programs. TensorFlow is the engine behind several Google tools you may already use, including Google Photos and the speech recognition found in the Google app.

Google has also released two new AIY kits that let individuals easily get hands-on with artificial intelligence. Focused on computer vision, and voice assistants, the two kits come as small self-assembly cardboard boxes with all the components needed for use. The kits are currently available at Target in the United States, and, notably, are both based on the open source Raspberry Pi platform—more evidence of how much is going on at the intersection of open source and AI.

Sparkling Water, formerly known as OxData, has carved out a niche in the machine learning and artificial intelligence arena, offering platform tools as well as Sparkling Water, a package that works with Apache Spark.’s tools, which you can access simply by downloading, operate under Apache licenses, and you can run them on clusters powered by Amazon Web Services (AWS) and others for just a few hundred dollars. Never before has this kind of AI-focused data sifting power been so affordable and easy to deploy.

Sparkling Water includes a toolchain for building machine learning pipelines on Apache Spark. In essence, Sparkling Water is an API that allows Spark users to leverage H2O’s open source machine learning platform instead of or alongside the algorithms that are included in Spark’s existing machine-learning library. has published several use cases for how Sparkling Water and its other open tools are used in fields ranging from genomics to insurance, demonstrating that organizations everywhere can now leverage open source AI tools.’s Vinod Iyengar, who oversees business development at the company, says they are working to bring the power of AI to businesses. “Our machine learning platform features advanced algorithms that can be applied to specialized use cases and the wide variety of problems that organizations face,” he notes.

Just as open source focused companies such as Red Hat have combined commercial products and services with free and open source ones, is exploring the same model on the artificial intelligence front. Driverless AI is a new commercial product from that aims to ease AI and data science tasks at enterprises. With Driverless AI, non-technical users can gain insights from data, optimize algorithms, and apply machine learning to business processes. Note that, although it leverages tools with open source roots, Driverless AI is a commercial product.


Acumos is another open source project aimed at simplifying access to AI. Acumos AI, which is part of the LF Deep Learning Foundation, is a platform and open source framework that makes it easy to build, share, and deploy AI apps. According to the website, “It standardizes the infrastructure stack and components required to run an out-of-the-box general AI environment. This frees data scientists and model trainers to focus on their core competencies and accelerates innovation.”

The goal is to make these critical new technologies available to developers and data scientists, including those who may have limited experience with deep learning and AI. Acumos also has a thriving marketplace where you can grab and deploy applications.

“An open and federated AI platform like the Acumos platform allows developers and companies to take advantage of the latest AI technologies and to more easily share proven models and expertise,” said Jim Zemlin, executive director at The Linux Foundation. “Acumos will benefit developers and data scientists across numerous industries and fields, from network and video analytics to content curation, threat prediction, and more.” You can learn more about Acumos here.

Luis Camacho Caballero is working on a project to preserve endangered South American languages by porting them to computational systems through automatic speech recognition using Linux-based systems. He was one of 14 aspiring IT professionals to receive a 2016 Linux Foundation Training (LiFT) scholarship, announced last month.  

Luis, who is from Peru, has been using Linux since 1998, and appreciates that it is built and maintained by a large number of individuals working together to increase knowledge. Through his language preservation project, he hopes to have the first language, Quechua, the language of his grandparents, completed by the end of 2017, and then plans to expand to other Amazonian languages.


Luis Camacho Caballero

Luis Camacho Caballero has started a project to preserve endangered South American languages through automatic speech recognition using Linux-based systems. Can you tell me more about Quechua, the language of your parents and grandparents?

Luis Camacho Caballero: Quechua was the lingua franca used in South American Andean between V and XVI centuries. It’s strongly associated to Inca culture (1300 BC – 1550 BC) but is clearly older than that. It is still alive and used by about 8 million people distributed among Ecuador, Perú and Bolivia. However, it’s under risk of extinction because, put in practice, the only language supported by government is Spanish. Don’t misunderstand, of course, there is a national agency for heritage preservation but it hasn’t gotten momentum yet. The process of substitution is running faster and stronger than initiatives of preservation.

It’s a shame, I speak just a bit. You can taste a piece of Quechua in these funny clips: 1, 2 and 3 and even hear some famous songs here: Heaven, The way you make feel (below), and bonus track. What is your process for recording and digitizing the language?

Luis: It’s a hard process. Basically, it is composed of two parts: building a text/voice Corpus and the language processing itself.

In regard to the first part, the challenges are 1) linking both Corpora, get a exact matching of voice and text and 2) In order to make the corpora more useful, doing part-of-speech tagging, or POS-tagging, in which information about each word’s part of speech (verb, noun, adjective, etc.) is added to the corpus in the form of tags.

In the part of the automatic speech recognition (ASR) itself, we are testing Artificial Intelligence algorithms looking for the one that matches better with features of the Quechua language. How did you get involved in this work?

Luis: Since that first time I was exposed to English ASR, maybe six years ago, I knew that I had to do ASR for Quechua, it’s my contribution to preserve my heritage. Is this a hobby, or a job for you?

Luis: Nowadays I am with PUCP, I wrote a proposal and fortunately it was granted by the Peruvian Science Foundation, so, I have resources for developing this project until Christmas 2017. Part of my job is networking with all the stakeholders and looking for more funds until we reach a complete ASR system, one at the same level of well-supported languages like English. How do you plan to use your LiFT scholarship?

Luis: Linux is a wonderful platform, almost all language computational portability technology is developed over Linux. I’ve not decided yet which course fits my current needs of Linux support. How will the scholarship help you?

Luis: I think the scholarship help me at least in two ways: 1) getting in touch with the more renowned expert Linux trainers and 2) getting a valuable knowledge that would otherwise would be expensive or inaccessible.


Interested in learning more about starting your IT career with Linux? Check out our free ebook “A Brief Guide To Starting Your IT Career In Linux.”

[Download Now]