We have made changes to our privacy policy.

Accelerating Python: From Research to Production

21 September 2023 | 6 minutes

By Steve Wilcockson

If you’re reading this, you likely want to add speed and performance to your Python code and notebooks while also reducing time from research to production.

In our whitepaper Accelerate Your Python Development Journey With PyKX, we’ve compiled a series of use cases, technical examples, and some “Getting Started” guides to introduce PyKX and how it provides interoperability between Python and kdb.

Having spent 25 years with quants, data scientists, and, more recently, data engineers, architects, and CTOs, I’ve seen Python emerge from a fringe functional language into a functional language powerhouse, driving data science and AI forward. I witnessed its rise in the late 2000s, and now work for an organization that embraces its culture and helps Python achieve new things in production big data analytics.

Python from the Other Side

I joined KX in May 2022. Prior to this I’d worked with two technologies strongly impacted by the rising tide of Python.

For many years, I worked with MATLAB, which, between 2010 and 2015, was the language of data science before Python (and arguably R) took over.

MATLAB is the language that Geoffrey Hinton used to explore neural nets (I had the pleasure of speaking with a university secretary when he needed an academic license). And Andrew Ng based his first online data science course around it. In fact, numerous other greats of the industry built their neural nets models and developed machine learning and statistical functionality with MATLAB.

I then worked with Java, the most popular language of the enterprise and in computer science education, until Python took over, in popularity terms at least.

I specifically note November 2020 when the influential Tiobe Index placed Python above Java for the first time, a trajectory that has not (yet) reverted. When I speak with practitioners, I increasingly hear phrases like “We no longer teach Java in our Computer Science course; we’re using Python primarily,” “Java is for us a legacy, albeit a useful one, but the new stuff is primarily Python,” and “Docker and Kubernetes changed everything – Java is no longer required, and I can more easily take languages like Python into cloud and microservices environments.”

It was tough to take because I was loyal to the languages I served – MATLAB and then Java – but I really loved Python’s focus on open source and community. On one hand, Python was simple. I give you some code, and you run it in your Python IDE or notebook. You didn’t have to “buy” a license for something.

But there was much more besides. If the community wanted something, it built it. Indeed, the reason I believe Python took off was because a chap called Wes Mckinney developed Pandas in 2008.

Pandas is essentially a data management and analysis application that wraps up low-level mathematical optimizations and operations of key Python package NumPy into simple data transformations, joins, aggregations, and analyses.

In short, Pandas made Python usable for everyday numerical applications, which happened to include big statistical calculations. Pretty soon, if not overnight, those who were from a world where “My professors taught me MATLAB during my master’s degree” were now in a situation where they could say: “So I studied 76,948 Python code repositories to teach myself Python.”

Python: The Language of Data Science

That key bit of functionality Wes McKinney developed gave coders a great technical reason, beyond “it’s free,” to build statistical, time-series, and machine-learning applications in Python. It changed the game.

What happened next? Well, Pandas invigorated the development and application of scikit-learn, the key machine-learning package in the Python ecosystem. In the meantime, Big Tech built and open-sourced deep learning code bases, for example, Facebook with PyTorch, Google with Tensorflow, Berkeley with Caffe.

For those familiar with OpenAI and LLaMa, PyTorch is the training giant on which LLM giants stand.

Thus, Python is now the language of Generative AI implementation, which is only natural as it’s been the language of data science, machine, and deep learning for 10 to 15 years.

But there’s a massive catch in the story I just told.

Python is the front-end, the thing people see and interact with. It’s a hub for data sets, running and sharing understandable code, often as Notebooks. The glue that binds the user experience together – allowing rapid, simple, reproducible prototyping and data science.

However, those deep learning libraries were mostly implemented in C++ or C. The predominant big and fast data platforms – those that stuck the course like Spark, Kafka, and Snowflake – were implemented, if not in C++, then in Java. The ecosystem that gives Python wings is commonly not Pythonic, but Python beautifully front-ends it.

Python: The Culture of Data Science

But Python is more than just a programming front-end, for standalone notebooks or extensive code bases powered by, and integrating with, a multitude of other technologies. It’s a vibrant culture, a global community, a Zen, a way of life.

Why KX?

So why PyKX? From a practitioner level, PyKX is a data and analytics technology that interoperates Python with KX production data management and analytics to give Python wings.

It’s supremely powerful, providing orders of magnitude greater efficiency and performance than Python and Pandas for key tasks, meaning it’s possible to take the best bits of both for something production-worthy sooner.

Pandas and KX are built from the same foundational mathematical and computational constructs.  The two technologies naturally align.

Incidentally, going back to my MATLAB days, while Python became the time series, matrix algebra, and vector processing tool of choice for research, in the sector where I operated (financial services), KX rapidly grew to be the same in production. I was squeezed on both sides! And I loved it! And the two environments work together.

From a cultural perspective too, KX is an organization that gets open source. I work with committed Pythonistas, which I love. They live the open-source mentality and participate in the Python communities, often alongside KX users who are open-source aficionados.

And that culture matters.

Read the whitepaper to learn how the technologies of Python and KX help accelerate performance, improve efficiency, and significantly reduce time to production.

To learn more about KX technologies, vector databases and time series analytics, sign up to your local Vectorize World Tour Community Meetup.

 

Demo kdb, the fastest time-series data analytics engine in the cloud








    For information on how we collect and use your data, please see our privacy notice. By clicking “Download Now” you understand and accept the terms of the License Agreement and the Acceptable Use Policy.