Python 解决方案：两个排序数组的中位数

2024 年 8 月 29 日 | 阅读 15 分钟

在这个问题中，我们将得到两个数组。设这两个数组为 array1 和 array2。这两个数组将是排序的，并且可以有不同的大小。设两个数组的大小分别为 n 和 m，其中 n 是 array1 中的元素数量，m 是 array2 中的元素数量。

让我们看一些例子来理解这个问题。

输入： array1 = [-5, 2, 7, 9, 11]，array2 = [-13, -10, -6, 3, 6, 13]

输出： 两个数组的中位数是 3。

如果我们合并两个数组，新数组将是排序的 = [-13, -10, -6, -5, 2, 3, 6, 7, 9, 11, 13]

由于元素数量为 11，即奇数个元素，中位数 = (n + m - 1) / 2，即第 6 个元素。Sorted[6] = 3

输入： array1 = [3, 6, 9, 10]，array2 = [4, 7, 10, 11, 12, 13]

输出： 两个排序数组的中位数是 9.50

如果我们合并两个数组，新数组将是排序的 = [3, 4, 6, 7, 9, 10, 10, 11, 12, 13]

由于元素数量为 10，即偶数个元素，中位数 = [(n + m) / 2, ((n + m ) / 2) + 1]，我们将取这两个值。因此，中位数是 (sorted[5] + sorted[6]) / 2 = 9.50。

方法 - 1

解决这个问题的最基本方法是创建一个第三个数组。这个第三个数组将存储两个数组的总和；因此，这个数组的总大小将是 n + m。我们将对这个新数组进行排序并找到中位数。我们必须检查 n + m 是奇数还是偶数才能找到中位数。如上所示，如果第三个数组的总长度是奇数，则中位数位于排序数组的 (n + m - 1) / 2 索引处。但是，如果第三个数组的长度是偶数，则中位数是排序数组中索引为 (n + m) / 2 和 ((n + m ) / 2) - 1 的两个元素的平均值。

这个想法与上面显示的示例非常相似。我们将按照以下步骤解决问题。

创建一个新数组 array3。array3 将包含 array1 和 array2 中的所有元素。我们可以通过对两个列表使用算术运算符来实现。Python 列表不是逐个元素相加的。
然后，我们将使用内置的 sorted() 函数对 array3 进行排序。
下一步是检查 array3 的长度是偶数还是奇数。
如果 array3 的长度是奇数，则中位数是 array3[(n + m - 1) // 2]。如果 array3 的长度是偶数，我们将找到 array3[n + m) // 2] 和 array3[((n + m ) // 2) - 1] 的平均值。

现在我们将看到上述方法的 Python 代码。

代码

# Python program to find the median of two sorted arrays.

# Defining a function that will take two arrays and return the median element
def median(array1, array2):

  n = len(array1)
  m = len(array2)

  # Creating the third array that will contain all the elements of array1 and array2
  array3 = array1 + array2

  # Now, we will sort the third array using the sorted() function
  array3 = sorted(array3)
  print("The sorted array is: \n", array3)

  # Checking if the length of array3 is even or odd
  if (n + m) % 2 != 0:
   # If the length is odd
   median = array3[(n + m - 1) // 2]

  else:
   # If the length is even
   median = (array3[(n + m) // 2] + array3[((n + m) // 2) - 1]) / 2
 
  # Returning the median value
  return median

# Creating two sorted arrays and passing them to the function
array1 = [-5, 2, 7, 9, 11]
array2 = [-13, -10, -6, 3, 6, 13]

print("The median of the sorted array is:", median(array1, array2))

array1 = [3, 6, 9, 10]
array2 = [4, 7, 10, 11, 12, 13]
print("The median of the sorted array is:", median(array1, array2))

输出

The sorted array is:
 [-13, -10, -6, -5, 2, 3, 6, 7, 9, 11, 13]
The median of the sorted array is: 3
The sorted array is:
 [3, 4, 6, 7, 9, 10, 10, 11, 12, 13]
The median of the sorted array is: 9.5

时间复杂度： 我们需要对大小为 (n + m) 的第三个数组进行排序。对该数组进行排序的时间复杂度为 O(Log (n + m))。

辅助空间： 我们使用了额外的内存空间来存储第三个数组，因此空间复杂度为 O(n + m)。

方法 - 2

我们将使用一种有效的方法来排序两个数组，从而降低时间复杂度。

给我们的两个数组是已排序的。我们可以利用这个属性，更有效地合并这两个数组，而不是先将它们相加然后排序。我们将跟踪已排序的数组数量，当达到两个数组的总元素数量的一半时，我们将打印中位数。在此方法中，我们还需要处理根据两个数组长度之和出现的两种情况。

如果长度之和为奇数，则中位数是合并两个数组后位于 (m + n) / 2 索引处的值。

如果长度之和为偶数，则中位数是合并两个数组后位于 (m + n) / 2 和 (m + n) / 2 + 1 索引处的值的平均值。

让我们进行一次干运行来理解这种方法。

让我们以两个排序数组 array1 = [3, 7] 和 array2 = [1, 2, 4, 5, 6] 为例。
这里 n = array1 的长度 = 2，m = array2 的长度 = 5。
长度之和 = 7，这意味着中位数是第 4 个值。
我们将运行一个从 0 到 3 的循环。
在第一次迭代中，我们将比较两个数组的第 0 个元素。由于 1 < 4，中位数 = 1
在第二次迭代中，我们将比较 array1 的第 0 个元素和 array2 的第 1 个元素。由于 2 < 4，m1 = 2
在第三次迭代中，我们将比较 array1 的第 0 个元素和 array2 的第 2 个元素。由于 3 < 4，m1 = 3
在最后一次迭代中，我们将比较 array1 的第 2 个元素和 array2 的第 2 个元素。由于 4 < 7，m1 = 4。
m1 的最终值是两个数组的中位数，即 4。
我们不需要继续，因为数组的长度是奇数。

以下是此方法的 Python 代码。

代码

# Python program to find the median of two sorted arrays in a linear time complexity.

# Defining a function that will take two sorted arrays and return the median of those arrays.
def median(array1, array2, n, m):

  # Initializing the pointers
  # This will keep track of elements of array1
  p1 = 0
  # This will keep track of elements of array2
  p2 = 0 

  # Initializing variables that will store the median values
  m1, m2 = -1, -1

  # Running a for loop, which will run till half of the length of the total length of two arrays
  for _ in range(((n + m) // 2) + 1):
    if(p1 != n and p2 != m):
      if array1[p1] > array2[p2]:
        m1 = array2[p2]
        p2 += 1
      else:
        m1 = array1[p1]
        p1 += 1
    elif(p1 < n):
      m1 = array1[p1]
      p1 += 1
    else:
      m1 = array2[p2]
      p2 += 1
      # We will return m1 if the total length is odd; otherwise we will return (m1 + m2) // 2
  return m1 if (n + m) % 2 != 0 else (m1 + m2) // 2


# Creating two sorted arrays and passing them to the function
array1 = [-5, 2, 7, 9, 11]
array2 = [-13, -10, -6, 3, 6, 13]
n1 = len(array1)
n2 = len(array2)
print("The median of the sorted array is:", median(array1, array2, n1, n2))

array1 = [3, 6, 9, 10]
array2 = [4, 7, 10, 11, 12, 13]
n1 = len(array1)
n2 = len(array2)
print("The median of the sorted array is:", median(array1, array2, n1, n2))

输出

The median of the sorted array is: 3
The median of the sorted array is: 9.5

时间复杂度：在此方法中，我们需要对合并后的数组进行排序。我们通过一个 for 循环找到了中位数。因此，此方法的 O(m + n) 时间复杂度。

辅助空间：与前一种方法不同，我们没有存储合并数组的地方。因此，此方法的空间复杂度是常数，即 O(1)。

方法 - 3

在此方法中，我们将使用二分查找。

正如我们在前一种方法中所见，排序数组的性质可以用来设计一种有效的方法来查找两个数组的中位数。对于排序数组最常用的方法之一是二分查找。我们将使用二分查找在不显式合并和排序的情况下划分两个数组。

为了找到中位数，我们需要找到一个点可以将合并后的数组分成两个相等的部分。我们可以使用二分查找在每个数组中单独找到一个点。当我们合并两个数组时，这些点也会合并，并将成为将合并后的数组分成两部分的点。

让我们举个例子：array1 = [-5, 2, 7, 9, 11]，array2 = [-13, -10, -6, 3, 6, 13]

我们需要在这两个数组上找到两个分割点。设 cut1 在第一个数组的 2 和 7 之间，cut2 在第二个数组的 3 和 6 之间。当我们分别合并两个半部分时，它们将是 half1 = [-13, -10, -6, -5, 2, 3] 和 half2 = [6, 7, 9, 11, 13]。
正如我们所见，3 是这个数组的中位数，而这两个分割点合并后形成合并后数组的两个部分。因此，我们必须通过二分查找找到两个排序数组的 cut1 和 cut2 的索引。

现在，我们将看到使用 Python 解决此问题必须遵循的步骤。

我们将对长度较大的数组应用二分查找。
为了找到中位数，我们应该知道中位数在哪里。正如我们在之前的例子中所见，合并成一个时，排序数组的中位数位于合并后数组的 (m + n + 1) / 2 处。我们将这个称为 median_position。
我们将找到长度较大的数组的中间点。设此为 array1，长度为 n。另一个数组是 array2，其长度为 m。array1 的中间点将是第一个数组的分割点。设中间点为 cut1。现在 cut1 左侧的元素数量等于 cut1 的值。显然，在合并数组的左侧，剩余的元素数量将是 median_position - cut1。这些元素将存在于第二个数组中。因此，第二个数组的分割点，即 cut2 = median_position - cut1。
在两个分割点之后，两个数组有四个部分。
- L1 = cut1 的左侧部分
- L2 = cut2 的左侧部分
- R1 = cut1 的右侧部分
- R2 = cut2 的右侧部分
但是为了得到最终答案，我们需要确认分割是否正确。要验证一个分割，我们需要检查 L1 <= R2 和 L2 <= R1 的值。如果这个条件成立，我们将检查总长度 (m + n) 是奇数还是偶数。如果总长度是偶数，我们将返回 max(L1, L2) 和 min(R1, R2) 的平均值。如果总长度是奇数，我们将返回 max(L1, L2)。
然而，如果条件不成立，我们将使用二分查找条件。如果 L1 的值大于 R2 的值，那么我们将 mid 移到 cut1 - 1。

代码

# Python program to find the median of two sorted arrays using the binary search method

# Defining a function that will take two arrays as its parameter and find the median using binary search
def median(array1, array2):

  # Finding the length of the two given arrays
  n = len(array1)
  m = len(array2)

  # We have to apply binary search on the array that has a bigger length
  # Assuming the array of length n is the bigger one if m > n, then we will swap the arrays and the lengths
  if (m > n):
    array1, array2 = array2, array1
    n, m = m, n

  # Initiaizing the pointers of the binary search
  l = 0
  h = n

  # Finding the actual position of the median in the merged sorted array
  median_position = (n + m + 1) // 2

  # Starting the while loop
  while (l <= h):
    mid = (l + h) // 2

    # Defining the cuts on the two arrays
    cut1 = mid
    cut2 = median_position - mid

    # Finding the elements that are on the left and right sides of the two arrays
    # We need to find these to validate the cuts and also to find th final answer
    # Here we will check the edge cases two, like if cut1 = 0, then cut1 - 1 will not be the left side of cut1
    l1 = array1[cut1 - 1] if cut1 != 0 else float('-inf')
    l2 = array2[cut2 - 1] if cut2 != 0 else float('-inf')
    r1 = array1[cut1] if cut1 != n else float('inf')
    r2 = array2[cut2] if cut2 != m else float('inf')

    # Checking if the current cuts are correct or not
    if l1 <= r2 and l2 <= r1:
      # If the cuts are proper, then we will find the median
      # If the total length of the two arrays is even, then we need to find the median of the two values that will be beside the cut of the merged array
      # These two values will be the maximum of the left sides and a minimum of the right sides
      if ((m + n) % 2 == 0):
        return (max(l1, l2) + min(r1, r2)) / 2.0

      # If the sum of lengths is odd, then the median is just the middle value, which will be the maximum of the left-side values
      return max(l1, l2)

    # If the cuts are not proper, then we need to shift the cuts
    # If the value of l1 > r2 that means we need to decrease l1 and increase r2.
    # this can be done by shifting cut1 to the left and shifting cut2 to the right
    # Since the binary search is applied on a single array, we will shift cut1, and cut2 will be shifted using median_position - cut1
    elif (l1 > l2):
      h = cut1 - 1
    else:
      # If the l2 > r1, then we have to do the opposite of what we did in the previous case.
      # We have to shift cut1 to the right and shift cut2 to the left
      l = cut1 + 1

# Creating the arrays and finding the median
array1 = [-5, 2, 7, 9, 11]
array2 = [-13, -10, -6, 3, 6, 13]
print("The median of the sorted array is:", median(array1, array2))

array1 = [3, 6, 9, 10]
array2 = [4, 7, 10, 11, 12, 13]
print("The median of the sorted array is:", median(array1, array2))

输出

The median of the sorted array is: 3
The median of the sorted array is: 9.5

时间复杂度：我们在长度较大的数组上应用二分查找。因此，时间复杂度将是 O(max(log m, log n))。我们应用了这个条件来通过其中一个长度为 0 的情况。

辅助空间：我们没有使用额外的空间来存储数组。因此，空间复杂度是常数，即 O(1)。

方法 - 4

在此方法中，我们将使用优先队列最小堆来查找中位数。

我们将通过迭代数组直到达到中位数索引来使用优先队列查找中位数。

让我们借助图示来理解算法。

array1 = [-5, 2, 7, 9, 11]，array2 = [-13, -10, -6, 3, 6, 13]，n = 5，m = 6

(n = array1 的长度，m = array2 的长度)

步骤 1：我们将初始化一个优先队列 p_q。我们将第一个数组的元素添加到优先队列中。

p_q.push(-5)

p_q.push(2)

p_q.push(7)

p_q.push(9)

p_q.push(11)

当我们将第一个数组的所有元素添加到优先队列时，最终队列将是 [-5, 2, 7, 9, 11]。

步骤 2：现在我们将第二个数组的元素添加到优先队列中。

p_q.push(-13)

p_q.push(-10)

p_q.push(-6)

p_q.push(3)

p_q.push(6)

p_q.push(13)

当我们添加第二个数组的元素到优先队列时，最终队列将是 p_q = [-13, -10, -6, -5, 2, 3, 6, 7, 9, 11, 13]

步骤 3：最后一步是找到中位数。我们将根据优先队列的长度是偶数还是奇数的情况来找到中位数。

如果优先队列的长度是奇数

那么，我们将对队列进行 (n + m) // 2 次迭代。在每次迭代中，我们将从队列中弹出一个元素。

在迭代结束时，队列的顶部元素将是中位数。

如果优先队列的长度是偶数

那么我们将迭代优先队列直到 (n + m) // 2 和 ((n + m) // 2) - 1。我们将返回队列顶部元素的平均值。

这是此方法的 Python 代码。

代码

# Python code to find the median of two sorted arrays using the Priority Queue

# Importing the library to use a queue data structure
import heapq

# Defining a function that takes two arrays and returns the median value
def median(array1, array2):

  # Calculating the lengths of the two arrays
  n, m = len(array1), len(array2)

  # initializing an empty Priority Queue
  p_q = []

  # First, we will add the elements of array1 to the priority queue
  for _ in range(n):
    heapq.heappush(p_q, array1[_])

  # Next, we will push the elements of array2 in the priority queue
  for _ in range(m):
    heapq.heappush(p_q, array2[_])

  # Total length of the priority queue
  total = n + m
  idx = -1
  m1, m2 = -1, -1

  # Using a while loop to traverse over the priority queue
  while p_q:

    # Maintaining the index value
    idx += 1

    # If the total length of the priority queue is odd, then we will return the middle element
    if total % 2 != 0 and idx == total // 2:
      return p_q[0]
    # We need to keep track of the two middle elements in case the total length of the priority queue is even
    # Then we will return the average of these mid-values
    if total % 2 == 0 and idx == (total // 2) - 1:
      m1 = p_q[0]
    
    # If the total length is even, then return the average of the middle elements
    if total % 2 == 0 and idx == total // 2:
      m2 = heapq.heappop(p_q)
      return (m1 + m2) / 2
    heapq.heappop(p_q)
  
  # This condition will be executed if the arrays are empty
  return 0.00000


# Creating the arrays and finding the median
array1 = [-5, 2, 7, 9, 11]
array2 = [-13, -10, -6, 3, 6, 13]
print("The median of the sorted array is:", median(array1, array2))

array1 = [3, 6, 9, 10]
array2 = [4, 7, 10, 11, 12, 13]
print("The median of the sorted array is:", median(array1, array2))

输出

The median of the sorted array is: 3
The median of the sorted array is: 9.5

时间复杂度：在这里，时间复杂度将是 O(max(n, m) * log(max(n, m)))。时间复杂度之所以如此，是因为我们对两个数组都使用了优先队列。

辅助空间：我们使用了额外的空间来存储优先队列。因此，空间复杂度等于优先队列的长度，即 O(n + m)。

下一个主题最大子数组和问题

Python 解决方案：两个排序数组的中位数

方法 - 1

方法 - 2

方法 - 3

方法 - 4

联系信息

关注我们

教程

面试题

在线编译器

Python

Java

.Net Framework

AI, ML and Data Science

Cloud Technology

B.Tech and MCA

Web Technology

PHP

Software Testing

Technical Interview

Java Interview

Python

Web Interview

Database Interview

B.Tech / MCA

Important Interview

Software Testing Interview

Company Interviews

Online Compilers

Multiple Choice Questions

Python 问题

Python 解决方案：两个排序数组的中位数

方法 - 1

方法 - 2

方法 - 3

方法 - 4

相关帖子

Python 中的 _name_ _main_

用于机器学习的最佳 Python 库

Python 中的柯里化

Python 中的移动应用程序自动化

Python 中的 Matplotlib.figure.Figure.add_subplot()

使用 Python 构建区块链

Python 中的高斯消元法

如何使用 Python 进行 Web 开发

Python 中的参数和形参

如何在 Windows 上安装 Python

订阅 Tpoint Tech

联系信息

关注我们

教程

面试题

在线编译器