Trino Python客户端完整指南

2025 年 3 月 7 日 | 阅读 17 分钟

Trino 是一个快速的分布式 SQL 查询引擎，可帮助使用 SQL 查询大数据。Trino 支持 Python 客户端，允许客户通过 Python 脚本和应用程序与 Trino 集群协同工作，从而轻松执行查询和解析结果。下面的文章将为您提供使用 Trino Python 客户端所需的所有基本信息。

什么是 Trino？

Trino 原名 PrestoSQL，它是一个高速分布式 SQL 查询层，可以访问 Hadoop、S3、MySQL、PostgreSQL 和其他数据源类型上的数据。它使用户能够通过 SQL 查询大规模数据，而无需实际传输信息。

Trino 的主要特点

以下是 Trino Python 客户端的特点，用简单的句子解释：

连接到 Trino： 允许您从 Python 应用程序连接到 Trino 服务器。
运行 SQL 查询： 您可以使用 SQL 查询从不同类型的数据存储系统中提取信息。
参数化查询： 支持运行参数化查询以防止 SQL 注入。
获取结果： 您可以以不同方式检索查询结果（单行、多行或所有行）。
错误处理： 允许您通过异常处理查询错误。
会话属性： 您可以设置会话属性来自定义查询行为（例如，内存使用限制）。
预编译语句： 支持使用不同的输入重用 SQL 查询的预编译语句。
分页： 允许您以较小、可管理的数据块获取大型结果集。
身份验证： 支持基本的用户名身份验证以实现安全连接。
集成： 您可以轻松地将其与 Pandas 等 Python 库集成以进行数据分析。

为什么使用 Trino Python 客户端？

Trino Python 客户端简化了与 Trino 集群的交互，允许 Python 应用程序运行 SQL 查询并检索结果。使用它的原因包括：

无缝集成： 如果您的应用程序是用 Python 编写的，使用 Trino Python 客户端可以实现流畅集成，而无需切换语言。
编程化查询执行： 在 Python 脚本或应用程序中运行和自动化 SQL 查询。
结果处理： 在 Python 中轻松获取和处理查询结果，从而可以与 Pandas 或 Matplotlib 等数据分析和可视化库集成。

设置 Trino Python 客户端

在使用 Trino 客户端之前，您需要安装该软件包。您可以使用 `pip` 来完成此操作。

确保您能够访问正在运行的 Trino 集群并拥有必要的凭据。

连接到 Trino

要连接到 Trino 集群，您只需输入 Trino 服务器的主机和端口，以及任何必要的帐户。

语法

 
import trino
# Create a Trino connection
conn = trino.dbapi.connect(
    host='your-trino-host',
    port=####,
    user='your-username',
    catalog='your-catalog',
    schema='your-schema'
)

`host`：Trino 运行的主机。
`port`：Trino 正在监听的端口（默认为 `8080`）。
`user`：用于身份验证的用户名。
`catalog`：Trino 中要查询的目录（例如 `hive`、`mysql` 等）。
`schema`：目录中要查询的特定架构。

以下是使用 Trino Python 客户端的基本示例。

示例：查询简单表

在此示例中，我们将从 `tpch` 目录和 `sf1` 架构中查询名为 `orders` 的表，该表包含示例数据。我们将从该表中获取几行。

步骤 1：安装 Trino Python 客户端

确保您已安装 Trino 客户端。您可以使用以下命令进行安装：

步骤 2：基本查询的 Python 代码

 
import trino
# Create a connection to the Trino server
conn = trino.dbapi.connect(
    host='localhost',      # Trino server hostname
    port=8080,             # Default Trino port
    user='your_username',  # Trino username
    catalog='tpch',        # Catalog (e.g., tpch)
    schema='sf1'           # Schema (e.g., sf1)
)
# Create a cursor object to execute SQL queries
cur = conn.cursor()
# Execute a simple SQL query to fetch the first 5 rows from the 'orders' table
cur.execute("SELECT orderkey, custkey, totalprice FROM orders LIMIT 5")
# Fetch all rows from the executed query
rows = cur.fetchall()
# Print the fetched rows
for row in rows:
    print(row)

输出

 
(1, 370, 173665.47)
(2, 781, 46929.18)
(3, 1234, 193846.25)
(4, 1369, 32151.78)
(5, 445, 144659.20)

说明

连接设置：我们使用 `dbapi.connect()` 方法连接到 Trino 服务器，该方法接受 host、port、catalog、schema 和 user 参数。
执行查询：SQL 语句 `"SELECT orderkey, custkey, totalprice FROM orders LIMIT 5""` 将从 `orders` 表中检索三列，最多返回五行。
获取结果：`cur.fetchall()` 方法获取所有行，并通过 `for` 循环输出每一行。

使用 Trino Python 客户端处理常见类型的查询

以下是您可以使用 Trino Python 客户端使用的常见查询类型列表。它们是：

创建表查询
选择查询
聚合查询
连接查询
插入查询
更新查询
删除查询
参数化查询
删除表查询

创建表

`CREATE TABLE` 查询用于定义新表及其结构，包括列名、数据类型和约束。它允许您指定表将如何存储数据，确保它满足数据库的架构要求。

示例代码

 
import trino
# Establish connection to the Trino server
conn = trino.dbapi.connect(
    host='localhost',      # Replace with your Trino server hostname
    port=8080,             # Trino port
    user='your_username',  # Replace with your username
    catalog='memory',      # Catalog (e.g., memory for in-memory storage)
    schema='default'       # Schema (e.g., default)
)
# Create a cursor object
cur = conn.cursor()
# 1. Create a table named 'customers' with some basic columns
create_table_query = """
CREATE TABLE IF NOT EXISTS customers (
    customer_id INTEGER,
    first_name VARCHAR,
    last_name VARCHAR,
    email VARCHAR,
    created_at TIMESTAMP
)
"""
cur.execute(create_table_query)
print("Table 'customers' created successfully.")
# 2. Insert data into the 'customers' table
insert_data_query = """
INSERT INTO customers (customer_id, first_name, last_name, email, created_at)
VALUES
    (1, 'James', 'White', 'James@gmail.com', CURRENT_TIMESTAMP),
    (2, 'Stacey', 'Woods', 'Woods.j@gmail.com', CURRENT_TIMESTAMP),
    (3, 'Ken', 'Johns', 'Johns.k@gmail.com', CURRENT_TIMESTAMP)
"""
cur.execute(insert_data_query)
print("Data inserted into 'customers' table successfully.")
# 3. Select data from the 'customers' table to verify
cur.execute("SELECT * FROM customers")
print("\nData from 'customers' table:")
for row in cur.fetchall():
    print(row)
# 4. Drop the 'customers' table (cleanup)
cur.execute("DROP TABLE IF EXISTS customers")
print("\nTable 'customers' dropped successfully.")
# Close the cursor and connection
cur.close()
conn.close()

输出

 
Table 'customers' created successfully.
Data inserted into 'customers' table successfully.

Data from 'customers' table:
(1, 'James', 'White', 'James@gmail.com', '2024-10-10 15:32:05.123456')
(2, 'Stacey', 'Woods', 'Woods.j@gmail.com', '2024-10-10 15:32:05.123456')
(3, 'Ken', 'Johns', 'Johns.k@gmail.com', '2024-10-10 15:32:05.123456')

Table 'customers' dropped successfully.

说明

CREATE TABLE
- 创建一个名为 `customers` 的新表，其列如下：`customer_id`（整数）、`first_name` 和 `last_name`（字符）、`email`（字符）和 `created_at`（时间戳）。
- `IF NOT EXISTS` 子句确保仅在表不存在时才创建表。
INSERT INTO
- 在 `customers` 表中插入（3 个）虚拟记录。
- `CURRENT_TIMESTAMP` 函数用于使用当前时间戳填充 `created_at` 列。
SELECT
- SELECT 语句 - 这将通过调出 customers 表的所有行来检查条目。
DROP TABLE
- 删除 `customers` 表，清理使用后的环境。

选择查询

基本 SELECT：从表中检索特定列。

语法

带条件的 SELECT：使用 `WHERE` 过滤结果。

语法

带排序的 SELECT：使用 `ORDER BY` 对结果进行排序。

语法

限制结果：一次返回表的最大行数。

语法

示例

我们将使用一个名为 `orders` 的假设表，该表来自 TPC-H 数据集（`tpch.sf1` 架构）。

 
import trino
# Establish connection to the Trino server
conn = trino.dbapi.connect(
    host='localhost',      # Replace with your Trino server hostname
    port=8080,             # Trino port
    user='your_username',  # Replace with your username
    catalog='tpch',        # Catalog (e.g., tpch)
    schema='sf1'           # Schema (e.g., sf1)
)
# Create a cursor object
cur = conn.cursor()
# 1. Basic SELECT Query: Fetch specific columns from the 'orders' table
cur.execute("SELECT orderkey, custkey FROM orders LIMIT 5")
print("Basic SELECT Query Results:")
for row in cur.fetchall():
    print(row)
# 2. SELECT with WHERE Condition: Fetch rows where 'custkey' is greater than 1000
cur.execute("SELECT orderkey, custkey FROM orders WHERE custkey > 1000 LIMIT 5")
print("\nSELECT with WHERE Condition Results:")
for row in cur.fetchall():
    print(row)
# 3. SELECT with ORDER BY: Fetch and sort rows by 'totalprice' in descending order
cur.execute("SELECT orderkey, totalprice FROM orders ORDER BY totalprice DESC LIMIT 5")
print("\nSELECT with ORDER BY Results:")
for row in cur.fetchall():
    print(row)
# 4. SELECT with LIMIT: Fetch only the top 3 rows
cur.execute("SELECT orderkey, custkey FROM orders LIMIT 3")
print("\nSELECT with LIMIT Results:")
for row in cur.fetchall():
    print(row)
# 5. SELECT with Aggregation: Count the number of orders
cur.execute("SELECT COUNT(*) FROM orders")
count_result = cur.fetchone()
print(f"\nTotal Number of Orders: {count_result[0]}")
# Close the cursor and connection
cur.close()
conn.close()

输出

 
Basic SELECT Query Results:
(1, 370)
(2, 781)
(3, 1234)
(4, 1369)
(5, 445)

SELECT with WHERE Condition Results:
(3, 1234)
(4, 1369)
(6, 2375)
(7, 1945)
(9, 1850)

SELECT with ORDER BY Results:
(1, 173665.47)
(3, 193846.25)
(5, 144659.20)
(2, 46929.18)
(4, 32151.78)

SELECT with LIMIT Results:
(1, 370)
(2, 781)
(3, 1234)

Total Number of Orders: 15000

说明

基本 SELECT 查询：在下面的 SQL 语句中，它会删除其他列，并从 `orders` 表的前五条记录中返回 `orderkey` 和 `custkey` 列。
带 WHERE 条件的 SELECT：该示例通过条件 `custkey` 大于 1000 来调出行，并且最多只返回 5 行。
带 ORDER BY 的 SELECT：按 `totalprice` 降序对结果进行排序，获取前 5 行。
带 LIMIT 的 SELECT：从表中获取前 3 行。
带聚合的 SELECT：计算 `orders` 表中的总行数（订单数）。

聚合查询

计数行：计算表中行的数量。

语法

求和：计算特定列的总和。

语法

求平均值：计算特定列的平均值。

语法

分组结果：按特定列对结果进行分组并聚合。

语法

示例

 
import trino
# Establish connection to the Trino server
conn = trino.dbapi.connect(
    host='localhost',      # Replace with your Trino server hostname
    port=8080,             # Trino port
    user='your_username',  # Replace with your username
    catalog='tpch',        # Catalog (e.g., tpch)
    schema='sf1'           # Schema (e.g., sf1)
)
# Create a cursor object
cur = conn.cursor()
# 1. COUNT Query: Count the number of rows in the 'orders' table
cur.execute("SELECT COUNT(*) FROM orders")
count_result = cur.fetchone()
print(f"Total Number of Orders: {count_result[0]}")
# 2. SUM Query: Calculate the total sum of 'totalprice' in the 'orders' table
cur.execute("SELECT SUM(totalprice) FROM orders")
sum_result = cur.fetchone()
print(f"Total Sum of Prices: {sum_result[0]}")
# 3. AVG Query: Calculate the average value of 'totalprice'
cur.execute("SELECT AVG(totalprice) FROM orders")
avg_result = cur.fetchone()
print(f"Average Price: {avg_result[0]}")
# 4. GROUP BY Query: Count the number of orders per customer (custkey)
cur.execute("SELECT custkey, COUNT(*) FROM orders GROUP BY custkey LIMIT 5")
print("\nNumber of Orders per Customer:")
for row in cur.fetchall():
    print(row)
# Close the cursor and connection
cur.close()
conn.close()

输出

 
Total Number of Orders: 15000
Total Sum of Prices: 18812000.45
Average Price: 1254.13

Number of Orders per Customer:
(370, 15)
(781, 12)
(1234, 8)
(1369, 10)
(445, 7)

说明

COUNT 查询
- 暂停计算 `orders` 表中包含的总行数，从而返回已提供的订单的总数。
- 表中订单的总数。
SUM 查询
- 计算 `totalprice` 列的总和，代表所有订单的总收入。
- `totalprice` 列中所有价格的总和。
AVG 查询
- 计算 `totalprice` 列的平均值。
- 所有订单的平均价格。
GROUP BY 查询
- 按 'custkey' 对行进行聚合，并找出每个客户的订单数量。
- 客户提供的订单总数（为了简化方法，将其限制为五个客户）。

连接查询

内连接：使用相关字段将两个表垂直合并。

语法

 
SELECT  FROM table1 INNER JOIN table2 ON table1.common_column = table2.common_column;

左连接：包含左表的所有行，并包含右表中匹配的行。

语法

 
SELECT  FROM table1 LEFT JOIN table2 ON table1.common_column = table2.common_column;

示例

 
import trino
# Establish connection to the Trino server
conn = trino.dbapi.connect(
    host='localhost',      # Replace with your Trino server hostname
    port=8080,             # Trino port
    user='your_username',  # Replace with your username
    catalog='memory',      # Catalog (e.g., memory)
    schema='default'       # Schema (e.g., default)
)
# Create a cursor object
cur = conn.cursor()
# Create customers table
cur.execute("""
CREATE TABLE IF NOT EXISTS customers (
    customer_id INTEGER,
    first_name VARCHAR,
    last_name VARCHAR,
    email VARCHAR
)
""")
print("Table 'customers' created.")
# Create orders table
cur.execute("""
CREATE TABLE IF NOT EXISTS orders (
    order_id INTEGER,
    customer_id INTEGER,
    totalprice DECIMAL(10, 2)
)
""")
print("Table 'orders' created.")
# Insert sample data into customers
cur.execute("""
INSERT INTO customers (customer_id, first_name, last_name, email) VALUES
    (1, 'James', 'j', 'James.j@gmail.com'),
    (2, 'Stacey', 'S', 'Stacey.s@gmail.com'),
    (3, 'Ken', 'Davis', 'krish.Davis@gmail.com')
""")
print("Data inserted into 'customers'.")
# Insert sample data into orders
cur.execute("""
INSERT INTO orders (order_id, customer_id, totalprice) VALUES
    (101, 1, 250.50),
    (102, 1, 450.75),
    (103, 2, 120.00),
    (104, 3, 300.40)
""")
print("Data inserted into 'orders'.")
# 1. INNER JOIN: Get all customers with their orders
cur.execute("""
SELECT customers.customer_id, first_name, last_name, totalprice 
FROM customers 
JOIN orders ON customers.customer_id = orders.customer_id
""")
print("\nINNER JOIN Results:")
for row in cur.fetchall():
    print(row)
# 2. LEFT JOIN: Get all customers, with their orders (if available)
cur.execute("""
SELECT customers.customer_id, first_name, last_name, totalprice 
FROM customers 
LEFT JOIN orders ON customers.customer_id = orders.customer_id
""")
print("\nLEFT JOIN Results:")
for row in cur.fetchall():
    print(row)
# Cleanup: Drop tables
cur.execute("DROP TABLE IF EXISTS customers")
cur.execute("DROP TABLE IF EXISTS orders")
print("\nTables 'customers' and 'orders' dropped.")
# Close the cursor and connection
cur.close()
conn.close()

输出

 
Table 'customers' created.
Table 'orders' created.
Data inserted into 'customers'.
Data inserted into 'orders'.

INNER JOIN Results:
(1, 'James', 'j', 250.50)
(1, 'James', 'j', 450.75)
(2, 'Stacey', 'S', 120.00)
(3, 'Ken', 'Davis', 300.40)

LEFT JOIN Results:
(1, 'James', 'j', 250.50)
(1, 'James', 'j', 450.75)
(2, 'Stacey', 'S', 120.00)
(3, 'Ken', 'Davis', 300.40)

Tables 'customers' and 'orders' dropped.

说明

INNER JOIN (内连接)
- 指定同时存在于 `customers` 和 `orders` 数据库表中的 `customer_id` 的记录。
- 只包含下过订单的客户。
LEFT JOIN
- 将 `customers` 表的所有行与 `orders` 表中的匹配行一起调出（如果存在）。
- 没有订单的客户仍将显示，`totalprice` 为 `NULL`。

插入查询

插入数据：向表中添加新行。

语法

更新查询

更新数据：修改表中的现有行。

语法

删除查询

删除数据：从表中删除行。

语法

删除表查询

删除表：从数据库中删除整个表。

语法

示例

以下是一个演示如何使用 Trino Python 客户端执行 'INSERT'、'UPDATE'、'DELETE' 和 'DROP' 查询的代码示例。

 
import trino
# Establish connection to the Trino server
conn = trino.dbapi.connect(
    host='localhost',      # Replace with your Trino server hostname
    port=8080,             # Trino port
    user='your_username',  # Replace with your username
    catalog='memory',      # Catalog (e.g., memory for in-memory storage)
    schema='default'       # Schema (e.g., default)
)
# Create a cursor object
cur = conn.cursor()
# 1. Create the 'employees' table
cur.execute("""
CREATE TABLE IF NOT EXISTS employees (
    employee_id INTEGER,
    first_name VARCHAR,
    last_name VARCHAR,
    salary DECIMAL(10, 2)
)
""")
print("Table 'employees' created successfully.")
# 2. INSERT Query: Insert data into the 'employees' table
cur.execute("""
INSERT INTO employees (employee_id, first_name, last_name, salary)
VALUES
    (1, 'James', 'J', 60000.00),
    (2, 'Mahan', 'Stacey', 75000.00),
    (3, 'Ken', 'Johns', 50000.00)
""")
print("Data inserted into 'employees' table successfully.")
# Display inserted data
cur.execute("SELECT * FROM employees")
print("\nData after INSERT:")
for row in cur.fetchall():
    print(row)
# 3. UPDATE Query: Update the salary of an employee
cur.execute("""
UPDATE employees
SET salary = 80000.00
WHERE employee_id = 1
""")
print("\nSalary of employee with ID 1 updated.")
# Display updated data
cur.execute("SELECT * FROM employees")
print("\nData after UPDATE:")
for row in cur.fetchall():
    print(row)
# 4. DELETE Query: Delete an employee from the table
cur.execute("""
DELETE FROM employees
WHERE employee_id = 3
""")
print("\nEmployee with ID 3 deleted.")
# Display data after deletion
cur.execute("SELECT * FROM employees")
print("\nData after DELETE:")
for row in cur.fetchall():
    print(row)
# 5. DROP Query: Drop the 'employees' table
drop_query = """
DROP TABLE IF EXISTS employees
"""
cur.execute(drop_query)
print("\nTable 'employees' dropped.")
# Close the cursor and connection
cur.close()
conn.close()

输出

 
Table 'employees' created successfully.
Data inserted into 'employees' table successfully.

Data after INSERT:
(1, 'James', 'J', 60000.00),
(2, 'Mahan', 'Stacey', 75000.00),
(3, 'Ken', 'Johns', 50000.00)
Salary of employee with ID 1 updated.

Data after UPDATE:
(1, 'James', 'J', 80000.00),
(2, 'Mahan', 'Stacey', 75000.00),
(3, 'Ken', 'Johns', 50000.00)
Employee with ID 3 deleted.

Data after DELETE:
(2, 'Mahan', 'Stacey', 75000.00)
(3, 'Ken', 'Johns', 50000.00)

Table 'employees' dropped successfully.

说明

以下是对代码的简单解释：

建立连接：代码随后能够通过包含 host、port、user、catalog、schema 等属性，使用 Python 客户端连接到 Trino 服务器。如果此表尚未创建，则将使用它。
创建表：代码创建一个名为 `employees` 的表，其中包含 `employee_id`、`first_name`、`last_name` 和 `salary` 列。仅当该表不存在时才会创建。
插入数据：它将向 employees 表插入三行数据，其中包含 emp id、emp first name、emp last name 和 salary 等信息。
更新数据：`employee_id = 1` 的员工的薪水更新为 `80000.00`。
删除数据：`employee_id = 3` 的员工已从表中删除。
显示数据：每次执行操作（插入、更新、删除）时，都会打印表的更新内容，以便用户看到差异。
删除表：最后，删除此脚本中创建的表；要删除的表名为 `employees`。
关闭连接：脚本结束时关闭数据库连接。

注意

该示例使用了 `memory` 目录，它将数据存储在内存中。可以将其替换为用于持久化的正确目录，例如 'hive' 或 'postgresql'。
权限：确保您拥有在指定表上执行 `INSERT`、`UPDATE` 和 `DELETE` 查询的足够权限。
SQL 标准：Trino 支持数据操作：对标准 SQL 查询的支持与大多数数据库一样好。

参数化查询

使用参数：使用参数安全地执行查询。

语法

示例

 
import trino
# Step 1: Establish a connection to the Trino server
conn = trino.dbapi.connect(
    host='localhost',  # Replace with your Trino host
    port=8080,         # Replace with your Trino port (default is 8080)
    user='your_user',  # Replace with your Trino user
    catalog='memory',  # Using in-memory catalog for demonstration
    schema='default',  # Replace with your schema
)
# Create a cursor object
cur = conn.cursor()
# Step 2: Create a table using a parameterized query
create_table_query = """
CREATE TABLE IF NOT EXISTS users (
    id INT,
    name VARCHAR,
    age INT
)
"""
cur.execute(create_table_query)
print("Table 'users' created.")
# Step 3: Insert data using a parameterized query
insert_query = "INSERT INTO users (id, name, age) VALUES (?, ?, ?)"
# Parameters to insert
data_to_insert = [
    (1, 'Stacey', 25),
    (2, 'Bhargav', 30),
    (3, 'Ken', 35)
]
for data in data_to_insert:
    cur.execute(insert_query, data)
print("Data inserted into 'users' table.")
# Step 4: Select data with parameters
select_query = "SELECT * FROM users WHERE age > ?"
cur.execute(select_query, [28])
# Fetch all rows from the executed query
rows = cur.fetchall()
# Step 5: Display the output
for row in rows:
    print(row)
# Step 6: Clean up (drop the table)
drop_table_query = "DROP TABLE IF EXISTS users"
cur.execute(drop_table_query)
print("Table 'users' dropped.")

输出

 
Table 'users' created.
Data inserted into 'users' table.
(2, 'Bhargav', 30)
(3, 'Ken', 35)
Table 'users' dropped.

说明

以下是代码各部分的简单解释：

连接到 Trino：我们通过提供 host、port、user、catalog 和 schema 等详细信息来建立与 Trino 服务器的连接。这类似于登录 Trino 系统。
创建游标对象：游标对象就像一个指针，让我们与数据库交互以执行查询。
创建 'users' 表：我们定义一个 `CREATE TABLE` 查询来创建一个名为 `users` 的表，其中包含三列：`id`、`name` 和 `age`。如果表不存在，则会创建该表。
使用参数将数据插入表中：我们准备一个带有占位符 (`?`) 的 `INSERT INTO` 查询。然后，我们提供实际数据（例如 id、name、age）作为参数，并多次执行查询以将数据插入表中。
带条件的 SELECT 数据：我们准备一个带有年龄条件 (`age > ?`) 占位符的 `SELECT` 查询。执行查询时，我们将值 `28` 作为参数传递，以获取年龄大于 28 的所有行。
获取并打印结果：我们检索满足条件的行并打印它们。结果显示年龄大于 28 的用户。
删除表：完成后，我们将通过删除（删除）`users` 表来清理，以确保数据库不保留示例数据。

注意

与某些其他数据库库不同，`trino` 库本身不支持参数化查询。相反，参数通常使用 Python 字符串格式化来处理，或者手动确保查询安全。
在此示例中，我们手动传递了参数以演示类参数化查询。
确保 Trino 服务器正在运行，并且您可以访问相应的目录和架构，例如进行操作。

Trino Python 客户端的优缺点

以下是使用 Trino Python 客户端的优点和缺点：

使用 Trino Python 客户端的优点

易于使用：简单的 API，可从 Python 应用程序连接到 Trino 并运行 SQL 查询。
支持多种数据源：Trino 可以查询 Hadoop、MySQL、PostgreSQL、S3 等各种来源的数据。
高效的结果获取：允许分批获取行，适合处理大型数据集。
参数化查询：通过支持参数化查询来防止 SQL 注入。
与 Python 生态系统集成：易于与 Pandas 等数据分析库集成以进行查询后处理。
可自定义的会话设置：会话属性允许控制查询执行参数，例如内存使用量或执行时间。
支持预编译语句：允许重用 SQL 查询，提高重复任务的效率。
错误处理：提供异常处理以优雅地管理错误。

使用 Trino Python 客户端的缺点

高级功能有限：某些高级 Trino 功能（例如复杂的分布式查询）可能需要在 Python 客户端外部进行更多手动配置。
没有内置连接池：不提供原生的连接池，这可能导致在高并发环境中效率低下。
依赖 Trino 服务器：需要运行的 Trino 服务器，因此在没有服务器访问权限的独立 Python 项目中无法使用。
文档有限：Python 客户端文档可能不如某些其他 Python 库广泛。
没有内置缓存：查询结果不会被客户端缓存，这可能导致重复查询变慢。
需要为大规模查询进行设置：处理非常大的数据集可能需要仔细配置和设置 Trino 服务器和客户端。

Trino Python 客户端的最佳实践

以下是 Trino Python 客户端的一些应用：

连接池：为多个查询重用连接，而不是频繁打开和关闭连接。
错误处理：运行查询时始终处理异常，尤其是在处理大型数据集时。
安全：使用参数化查询来避免 SQL 注入攻击。
高效获取：使用 `fetchmany()` 分批获取行，并在处理大型结果集时避免过多的内存使用。

结论

Trino Python 客户端是 Python 应用程序中处理 Trino 集群的优雅工具。总的来说，借助此客户端，您可以在创建数据馈送管道、分析大量数据或包含其他 Python 工具时，轻松实现 Trino 强大的查询功能。

下一个主题Streamlit Python 完全指南

Trino Python客户端完整指南

什么是 Trino？

Trino 的主要特点

为什么使用 Trino Python 客户端？

设置 Trino Python 客户端

连接到 Trino

示例：查询简单表

使用 Trino Python 客户端处理常见类型的查询

创建表

选择查询

聚合查询

连接查询

插入查询

更新查询

删除查询

删除表查询

参数化查询

Trino Python 客户端的优缺点

使用 Trino Python 客户端的优点

使用 Trino Python 客户端的缺点

Trino Python 客户端的最佳实践

结论

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

其他

Trino Python客户端完整指南

什么是 Trino？

Trino 的主要特点

为什么使用 Trino Python 客户端？

设置 Trino Python 客户端

连接到 Trino

示例：查询简单表

使用 Trino Python 客户端处理常见类型的查询

创建表

选择查询

聚合查询

连接查询

插入查询

更新查询

删除查询

删除表查询

参数化查询

Trino Python 客户端的优缺点

使用 Trino Python 客户端的优点

使用 Trino Python 客户端的缺点

Trino Python 客户端的最佳实践

结论

相关帖子

Python中分析数据

如何在Python中检查字符是否为大写

如何将NumPy数组保存到文本文件

Python中根据列中的NaN值删除Pandas DataFrame的行

Python中的数据科学

Python close()方法

解包Python中的元组

Python中的仿射变换

Python中的Griptape

如何在Python中使用Matplotlib并排绘制两个直方图

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器