DATA MANIPULATION WITH NUMPY AND PANDAS

NumPy and Pandas are two powerful libraries for data manipulation in Python. They provide various functions and data structures to efficiently work with numerical data and perform data operations, such as filtering, aggregation, merging, and transformation. Let's explore some common data manipulation techniques using NumPy and Pandas:

Data Manipulation with NumPy:

  1. Creating NumPy Arrays: You can create NumPy arrays from Python lists or using built-in functions like numpy.array() or numpy.arange().
python
import numpy as np # Creating a NumPy array from a list data = [1, 2, 3, 4, 5] numpy_array = np.array(data) # Creating a NumPy array using arange() range_array = np.arange(1, 6) # Output: [1 2 3 4 5]
  1. Array Operations: NumPy provides element-wise operations for arrays.
python
array1 = np.array([1, 2, 3]) array2 = np.array([4, 5, 6]) # Element-wise addition result_add = array1 + array2 # Output: [5 7 9] # Element-wise multiplication result_mul = array1 * array2 # Output: [4 10 18]
  1. Filtering and Indexing: You can use Boolean indexing to filter data in arrays.
python
data = np.array([10, 20, 30, 40, 50]) filtered_data = data[data > 30] # Output: [40 50]

Data Manipulation with Pandas:

  1. Creating Pandas DataFrames: Pandas DataFrames can be created from dictionaries or by reading data from files.
python
import pandas as pd # Creating a DataFrame from a dictionary data = { 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 22], 'City': ['New York', 'San Francisco', 'Chicago'] } df = pd.DataFrame(data) # Reading data from a CSV file df = pd.read_csv('data.csv')
  1. Data Selection and Filtering: Pandas allows easy data selection and filtering based on conditions.
python
# Selecting specific columns names = df['Name'] # Filtering rows based on conditions young_people = df[df['Age'] < 30]
  1. Aggregation and Grouping: Pandas supports various aggregation functions and grouping operations.
python
# Computing mean age mean_age = df['Age'].mean() # Grouping by a column and computing the mean of each group grouped_data = df.groupby('City')['Age'].mean()
  1. Merging DataFrames: Pandas can merge DataFrames based on common columns.
python
# Merging two DataFrames based on a common column df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']}) df2 = pd.DataFrame({'ID': [2, 3, 4], 'City': ['New York', 'Chicago', 'Los Angeles']}) merged_df = pd.merge(df1, df2, on='ID')

NumPy and Pandas provide a wide range of functionalities for data manipulation, making them essential tools for working with data in Python. Whether you need to perform simple array operations or handle complex data transformations, these libraries offer efficient and flexible solutions for a variety of data manipulation tasks.