What is the best way to loop through large arrays

I have a list of subscribers in a table with a total rows of 400,000
The table structure has just four columns
id | amount | balance | date

Now the amount and balance column values are encrypted using ECB encryption.

I want to update their values of amount and balance by 2.5%

I have two options:

  1. I can run a sql to decrypt and adjust the balance in the table but i don’t know how reliable this will always be especially for rows that much, and how long it will take to complete the update.

  2. Fetch all data, loop through it decrypt it value. and upload back.

Option two looks very rough but i still need to know if I want to go for options 2, what will be the best way to loop without timeout.

I have heard of using generators but i don’t know how to use it and how effective can it be for large dataset

Is the database and the webserver in the same machine?

For particularly sensitive information like that, my impulse would always be to move it as few times as possible. Limits the exposure.

Yes the database and webserver is in the same machine.

Actually the amount and balance are encrypted data at rest.

Yes, the data is encrypted. But what’s more protected, an envelope in a safe that gets moved from building to building in a truck, or an envelope in a safe that never sees the light of day?

In any case, if the database and webserver are running on the same machine, transportation isnt the concern.

If you can run a SQL to decrypt and adjust, it’s probably going to be faster than importing 400,000 rows, decrypting, and adjusting, right?

The SQL server and webserver are running on the same machine, so they have the same amount of machine resources available for processing… i’m still inclined to let the SQL server do the lifting…

my observations too, but just curious to know how effective and proficient can mysql handle such large decrpypt, calculate and encrypt back for 300,000 rows without errors

I dont know how to quantify “effective” or “proficient”, as those would be subjective evaluations, and even if i tried, the answer would depend entirely on your server. Running the most efficient software in the world on a potato isnt going to meet any metric that could be produced.

on that note, my machine takes large chunks of the determining factor as to how effective and proficient the results can be.

If thats the case , giving the figurative expectancy of the tasks, what machine or composition of a machine or server I should be looking forward to for faster and better performance.

I dont… really know how to answer that either.

More bigger numbers = gooder.

The question is in avoidance of clamouring for high end turbor servers whereas by default a normal less expensive or robust server could handle such dataset without blinking

A potato can handle it. Can it handle it quickly? No. What defines quick enough? Not anyone on this side of the computer screen.

Im that case this ignites a quest of finding the exact match of a server by scaling up the moment a slow response or performance is detected.

The fastness or speed isn’t much of a concern for me because that could be easily traced or fished out.

My major worry is wrong or uncompleted calculations, seeing this has to do with figures.

Can cause loss of funds to both the company and the user if the sql fails to complete the total number as expected.

Just like you wrote a update query, and can’t tell on the first glance if the update was successful or not, until a user submits a complain that his data remained unchanged after the r

rowCount() will likely return only the number of rows affected.

Maybe a combination of total number of rows against rowCount() can give me a clue if the update completed successfully.

But on the second hand since am going to follow the analysis using php to know the outcome and then email admin incase of such occurrence, if it fails due to system or server crash then the rowCount won’t return anything and my corresponding actions won’t work because the server currently needs oxygen and CPR

Hey pandglobal,

I see you’re working with a substantial dataset and need to update encrypted values efficiently. Here’s a breakdown of your options:

  1. Decrypt and Update in SQL: While this is straightforward, it might not be the most efficient for such a large dataset, especially if your encryption keys are stored separately or if the decryption process is resource-intensive. It could also be risky if the decryption process isn’t reliable or if it’s slow.

  2. Fetch, Decrypt, and Update Programmatically: This approach gives you more control, but as you mentioned, it can be challenging with a large dataset. For a smoother process, consider the following steps:a. Batch Processing: Fetch data in manageable chunks, say 1000 rows at a time, to avoid timeouts and excessive memory usage.b. Use Generators: Generators can help manage memory efficiently by processing data one row at a time. Here’s a basic example of how you might use a generator for this task:

def fetch_encrypted_data():
    # Fetch data in batches, e.g., 1000 rows at a time
    # Return decrypted values one row at a time
    for batch in batches_of_1000_rows:
        for row in batch:
            # Decrypt and yield the row
            yield decrypted_row

def update_data():
    for decrypted_row in fetch_encrypted_data():
        # Update the amount and balance by 2.5%
        update_row(decrypted_row)

update_data()

  1. This approach keeps memory usage low and ensures the process is more manageable. You can adjust the batch size based on your system’s resources and performance.

Overall, option 2 with batch processing and generators could be more efficient and reliable for updating a large dataset while managing timeouts and memory usage.

1 Like

Thanks for the suggestions, yield is okay but then does it mean it can’t be affected by limits or time out.

  1. Let me assume that i want to leave the tables unencrypted, and just run simple calculations like amount * pnl, and i have 1,000,000 rows can i trust the result calculated in the sql that way?

  2. If am using yield and i have 14000 rows of pnl to update 1000000 rows in sql, going by pulling, decrypt and update back, and that means hitting my sql countless numbers of time

Please can you reproduce this code in php, let me understand it better,

and if am batching 100000 / 1000 thats 100 units which is very okay to go with but the truth I won’t be there to run the remaining 100 units since the update were meant to happen automatically after trade execution.

it means i have to start something like deamon, that will run the update in sequences

I have decided to let the encryption off the design so i can deal directly with figures.

But here is my most concern

UPDATE table SET balance = (balance + (balance * $pnl / 100)) WHERE name = john

Now knowing that there are over 1,000,000 rows having john as name.

Can you trust sql to handle this without errors, miscalculation or server damage or locking?

because second option Will be pull all records and start looping to make calculations, adter which you start inserting them back

There are a lot of variables in doing either, it truly depends on your server config and database setup.If you have your database have a field called “is_processed” with value of “yes” or “no” then as it sends, it marks them complete. Then the next cron job, it asks for all rows that is_processed=“no” and only processes them. That’s what I would do to be safe. There are a lot of tweaks you can do along the way to get it just right for your needs.

1 Like

Please can you write this in php for me

Sure, but after that, i have to get back to work. Give me a few minutes.

Here you go, hope it helps!

<?php
/*
	Remember to replace 'encryption_key' with your actual encryption key & make sure your encryption
	methods (openssl_decrypt and openssl_encrypt) match your encryption algorithm and key.
*/
function fetch_encrypted_data() {
	// Function to fetch encrypted data in batches and return decrypted values one row at a time
	Global $conn;
	$batch_size = 1000;
	$offset = 0;

	while (true) {
		$query = 'SELECT `id`, `encrypted_amount`, `encrypted_balance` FROM `subscribers` LIMIT ' . $batch_size . ' OFFSET ' . $offset;
		$result = mysqli_query($conn, $query);

		if (!$result) {
			// Handle query error
			die('Error in query: ' . mysqli_error($conn));
		}

		if (mysqli_num_rows($result) == 0) {
			break;
		}

		while ($row = mysqli_fetch_assoc($result)) {
			// Decrypt the amount and balance
			$decrypted_amount = decrypt_value($row['encrypted_amount']);
			$decrypted_balance = decrypt_value($row['encrypted_balance']);

			// Yield the decrypted row
			yield [
				'id' => $row['id'],
				'amount' => $decrypted_amount,
				'balance' => $decrypted_balance
			];
		}

		$offset += $batch_size;
	}
}

function decrypt_value($encrypted_value) {
	/*
		Function to decrypt values
		Replace this with the actual decryption logic
	*/
	$decrypted_value = openssl_decrypt($encrypted_value, 'aes-256-ecb', 'encryption_key');
	if ($decrypted_value === false) {
		/*
			Handle decryption error
		*/
		die('Error decrypting value');
	}
	return $decrypted_value;
}

function update_data() {
	/*
		Function to update the decrypted data by 2.5%
	*/
	Global $conn;
	foreach (fetch_encrypted_data() as $decrypted_row) {
		// Update the amount and balance by 2.5%
		$new_amount = $decrypted_row['amount'] * 1.025;
		$new_balance = $decrypted_row['balance'] * 1.025;

		// Encrypt the new values
		$encrypted_amount = encrypt_value($new_amount);
		$encrypted_balance = encrypt_value($new_balance);

		// Update the database
		$id = $decrypted_row['id'];
		$query = 'UPDATE `subscribers` SET `encrypted_amount`= “' . $encrypted_amount . '", `encrypted_balance` = "' . $encrypted_balance . '" WHERE id = "' . $id . '" ';
		$result = mysqli_query($conn, $query);
		if (!$result) {
			// Handle update error
			die('Error updating row: ' . mysqli_error($conn));
		}
	}
}

function encrypt_value($value) {
	/*
		Function to encrypt values
		Replace this with the actual encryption logic
	*/
	$encrypted_value = openssl_encrypt($value, 'aes-256-ecb', 'encryption_key');
	if ($encrypted_value === false) {
		// Handle encryption error
		die('Error encrypting value');
	}
	return $encrypted_value;
}

// Execute the update
update_data();
?>

2 Likes