Apache Parquet

columnar storage, big data, compression, file format, storage

Open jobs

145

Companies looking for Parquet

141

Back in the day, when data lakes were threatening to drown us all in unstructured chaos, came Apache Parquet. The promise? Columnar storage to rescue our queries from the abysmal depths of full table scans. It’s essentially a clever way of organizing data on disk so you only read what you need, not everything. Quite clever, really, though one does wonder if everyone truly understood the implications for schema evolution.

The reality? It's become a de facto standard for anything touching Spark, Hive, or Presto. Forget CSV; Parquet’s where the cool kids hang out. It's not a silver bullet, mind you – small files can be a nightmare, and it’s not ideal for every workload. But compared to row-oriented formats or even older columnar solutions, it offers a compelling balance of compression, performance, and ecosystem support. Today, if you're not using it, you're probably doing it wrong, or at least making your data engineers weep softly into their lattes.

Used together with Parquet

Additional Resources

Compare to other file formats

Jobs (this month)

145

Companies with Jobs

141

Jobs in

All cities

Amsterdam

San Francisco

London

Paris

New York

Berlin

Copenhagen

Singapore

Tokyo

Sydney

Madrid

Rio De Janeiro

using Apache Parquet for

All job types

Analytics Engineers

Data Engineers

Analysts

Data Scientists

Machine Learning Engineers

Others

but please no

Agile/Scrum

JIRA

Activity Schema

Adobe Analytics

Agile/Scrum

Artificial Intelligence/Machine Learning

Airbyte

Apache Airflow

Alation

Alteryx

Amplitude

Analytics

Analytics Engineering

Ansible

Apache Flink

Application Programming Interface (API)

AppDynamics

Redpanda

Webhooks

Argo CD

Apache Arrow

Astronomer

Amazon Athena

Apache Avro

Amazon Web Services (AWS)

Amazon Aurora

AWS CloudFormation

Amazon CloudWatch

Amazon EC2

Amazon EMR

AWS Glue

Amazon Kinesis

AWS Lambda

Amazon RDS

Microsoft Azure

Azure Data Factory

Azure DevOps

Bash

Apache Beam

Business Intelligence (BI)

Big Data

BigEye

Google BigQuery

Blendo

Blockchain

C

C#/.NET

Apache Cassandra

Cyber Security

Certified Cloud Security Professional (CCSP)

Certified Information Systems Security Professional (CISSP)

FinOps

Customer Data Platform (CDP)

Chef

Chroma

Continuous Integration/Continuous Delivery (CI/CD)

Circle CI

ClickHouse

Cloud Computing

Cloud Storage

Cloudflare

Azure Cosmos DB

Collibra

IBM Cognos

Computer Science

Confluence

C++

Customer Relationship Management (CRM)

Comma-Separated Values (CSV)

Cypress

Dagster

Dashboard

Data Analytics

Data Contracts

Data Engineering

Data Governance

Data Lake

Data Lakehouse

Data Management

Data Modelling

Data Quality

Data Science

Data Vault

Data Visualization

Databricks

Datacoral

Datadog

Google Cloud Dataflow

Datafold

Google Dataform

Dataiku

DataOps

Google Cloud Dataproc

Data Analysis Expressions (DAX)

dbt (data build tool)

Delta Lake

DevOps

Docker

Dremio

DuckDB

Data Warehouse

DynamoDB

Dynatrace

Elasticsearch/ELK Stack

Enterprise Resource Planning (ERP)

ETL/ELT

Ethereum

Binance

Non-Fungible Tokens (NFT)

Geographic Information System (GIS)

Decentralized Finance (DeFi)

Smart Contracts

Chainalysis

Web3

Microsoft Excel

Feather

Fivetran

Funnel

Google Cloud Platform (GCP)

GDPR/CCPA

Generative AI

Git

GitHub

GitLab

Go

Google Analytics

Google Cloud Composer

Google Cloud Data Fusion

Google Cloud Functions

Google Cloud Run

Google Sheets

Grafana

Google Tag Manager (GTM)

Apache Hadoop

Apache HBase

Hierarchical Data Format

Hadoop Distributed File System (HDFS)

Heap Analytics

Hevo Data

Apache Hive

HyperText Markup Language (HTML)

Hubspot

Hyper-V

IBM

Apache Iceberg

Informatica

Data Collection

Internet of Things (IoT)

Java

JavaScript

Jenkins

Jira

JavaScript Object Notation (JSON)

JSON Schema

Apache Kafka

Keras

Kibana

(Kimball) Dimensional Modeling

Kissmetrics

Key Performance Indicator (KPI)

Kubernetes

Linux

Large Language Models (LLM)

Logstash

Looker

Looker Studio

Luigi

Management

MariaDB

Marketing

Marketing Mix Modeling (MMM)

Master Data Management

Masthead Data

Matillion

MATLAB

Matomo

Matplotlib

Modern Data Stack

Meltano

Mendix

Metabase

Microsoft

Microsoft Fabric

Mixpanel

MLOps

Mode Analytics

MongoDB

Monte Carlo

MySQL

IBM Netezza

Neo4j

New Relic

Natural Language Processing (NLP)

NoSQL

NumPy

Opsgenie

Oracle

Optimized Row Columnar (ORC)

PagerDuty

Pandas

Apache Parquet

Pendo

Doctor of Philosophy (PhD)

Pinecone

Piwik PRO

Plausible Analytics

Playwright

Plotly

Polars

PostgreSQL

Microsoft Power BI

Microsoft PowerPoint

PowerShell

Prefect

Presto

Process Mining

Prometheus

Protocol Buffers

Pub/Sub

Pulumi

Puppet

Puppeteer

Pydantic

PySpark

Python

PyTorch

Qlik

Amazon QuickSight

R (Language)

Retrieval Augmented Generation (RAG)

Relational Database Management System (RDBMS)

React

Redash

Redis

Amazon Redshift

Recurrent Neural Networks

Rust

Amazon S3

Software as a Service (SaaS)

Amazon SageMaker

SAP

SAS

Scala

Scikit-learn

SciPy

Seaborn

Twilio Segment

Selenium

Singer

Sisense

Snowflake

Snowplow

Apache Spark

Splunk

SPSS

Structured Query Language (SQL)

SQLMesh

SQLFluff

SQLFmt

Microsoft SQL Server

SQL Server Analysis Services (SSAS)

SQL Server Integration Services (SSIS)

SQL Server Reporting Services

Stitch

Data Streaming

Supermetrics

Apache Superset

Azure Synapse Analytics

TIBCO Spotfire

Tableau

Talend

TensorFlow

Teradata

Bicep

Infrastructure as Code (IaC)

Azure Resource Manager (ARM)

Terraform

Terragrunt

Third Normal Form (3NF)

TOML

Trifacta

Apache Trino

Apache Druid

TypeScript

Unix

Visual Basic for Applications (VBA)

Vector DB

Vertica

VirtualBox

Virtual Machine

VMware

Microsoft Word

Extensible Markup Language (XML)

Xplenty

Yet Another Markup Language (YAML)

Motherduck

PostHog

Kestra

Omni Analytics

Thoughtspot

Lightdash

Hudi

Open Table Format (OTF)

Lance

data engineer Junior Data Engineer @ burson GB \| 2025-12-26 A Junior Data Engineer role at Burson involves supporting data pipelines and AI models in a hybrid London setting. The position emphasizes Python scripting, Azure cloud infrastructure, and... read more »	AI/ML, DevOps, Python, Azure, Computer Science, BI, Agile/Scrum, Git, API, Power BI, Azure DevOps, Java, R, DAX, NoSQL, SQL, MongoDB, Parquet		Junior Data Engineer burson (GB) FULL TIME \| JOB LISTED A Junior Data Engineer role at Burson involves supporting data pipelines and AI models in a hybrid London setting. The position emphasizes Python scripting, Azure cloud infrastructure, and collaboration within a diverse team, with a particular differentiator being its blend of AI support and DevOps responsibilities. The main risk lies in the potential for evolving data infrastructure needs that could outpace initial skill sets. The role is hands-on, technical, and suitable for those with computer science background or practical experience, but it does not specify a salary. Generated content Technology used AI/ML DevOps Python Azure Computer Science BI Agile/Scrum Git API Power BI Azure DevOps Java R DAX NoSQL SQL MongoDB Parquet Listed At 2025-12-26 2025-12-05 (original listing) Similar Jobs Loading... Permalink View original posting
data engineer Senior Data Engineer - (Genetics) Maternity Cover - 12 months FTC @ our-future-health-uk GB \| 2025-12-22 This role is for a Senior Data Engineer specializing in genetic data processing, with responsibilities involving building and maintaining robust pipelines for data storage and release. The... read more »	Data Engineering, CI/CD, Agile/Scrum, Cloud Computing, Python, Unix, Azure, Parquet, Delta, Docker, Kubernetes, Spark, Databricks, Git, GitHub		Senior Data Engineer - (Genetics) Maternity Cover - 12 months FTC our-future-health-uk (GB) FULL TIME \| JOB LISTED This role is for a Senior Data Engineer specializing in genetic data processing, with responsibilities involving building and maintaining robust pipelines for data storage and release. The position requires expertise in cloud-based data engineering, genetic data handling, and industry-standard tools. While the role emphasizes practical, real-world data engineering, it lacks clarity on the company’s position as an intermediary (e.g., contractor, agency, reseller). The salary range is not explicitly provided, and the description does not highlight any distinct innovation or unique aspect of the role beyond typical data engineering skills. The tools and technologies mentioned include Apache Parquet, Delta tables, Docker, Kubernetes, Spark, and Git/GitHub, but no specific range is given. Generated content Technology used Data Engineering CI/CD Agile/Scrum Cloud Computing Python Unix Azure Parquet Delta Docker Kubernetes Spark Databricks Git GitHub Listed At 2025-12-22 2025-07-02 (original listing) Similar Jobs Loading... Permalink View original posting

data engineer Junior Data Engineer @ burson GB \| 2025-12-26 A Junior Data Engineer role at Burson involves supporting data pipelines and AI models in a hybrid London setting. The position emphasizes Python scripting, Azure cloud infrastructure, and... read more »	AI/ML, DevOps, Python, Azure, Computer Science, BI, Agile/Scrum, Git, API, Power BI, Azure DevOps, Java, R, DAX, NoSQL, SQL, MongoDB, Parquet		Junior Data Engineer burson (GB) FULL TIME \| JOB LISTED A Junior Data Engineer role at Burson involves supporting data pipelines and AI models in a hybrid London setting. The position emphasizes Python scripting, Azure cloud infrastructure, and collaboration within a diverse team, with a particular differentiator being its blend of AI support and DevOps responsibilities. The main risk lies in the potential for evolving data infrastructure needs that could outpace initial skill sets. The role is hands-on, technical, and suitable for those with computer science background or practical experience, but it does not specify a salary. Generated content Technology used AI/ML DevOps Python Azure Computer Science BI Agile/Scrum Git API Power BI Azure DevOps Java R DAX NoSQL SQL MongoDB Parquet Listed At 2025-12-26 2025-12-05 (original listing) Similar Jobs Loading... Permalink View original posting
data engineer Senior Data Engineer - (Genetics) Maternity Cover - 12 months FTC @ our-future-health-uk GB \| 2025-12-22 This role is for a Senior Data Engineer specializing in genetic data processing, with responsibilities involving building and maintaining robust pipelines for data storage and release. The... read more »	Data Engineering, CI/CD, Agile/Scrum, Cloud Computing, Python, Unix, Azure, Parquet, Delta, Docker, Kubernetes, Spark, Databricks, Git, GitHub		Senior Data Engineer - (Genetics) Maternity Cover - 12 months FTC our-future-health-uk (GB) FULL TIME \| JOB LISTED This role is for a Senior Data Engineer specializing in genetic data processing, with responsibilities involving building and maintaining robust pipelines for data storage and release. The position requires expertise in cloud-based data engineering, genetic data handling, and industry-standard tools. While the role emphasizes practical, real-world data engineering, it lacks clarity on the company’s position as an intermediary (e.g., contractor, agency, reseller). The salary range is not explicitly provided, and the description does not highlight any distinct innovation or unique aspect of the role beyond typical data engineering skills. The tools and technologies mentioned include Apache Parquet, Delta tables, Docker, Kubernetes, Spark, and Git/GitHub, but no specific range is given. Generated content Technology used Data Engineering CI/CD Agile/Scrum Cloud Computing Python Unix Azure Parquet Delta Docker Kubernetes Spark Databricks Git GitHub Listed At 2025-12-22 2025-07-02 (original listing) Similar Jobs Loading... Permalink View original posting