https://github.com/matomo-org/matomo/pull/17408 - Mouve

Site d'origine

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store segment hash in DB #17408

Merged
merged 77 commits into from Apr 18, 2021
Merged

store segment hash in DB #17408

merged 77 commits into from Apr 18, 2021

Conversation

@flamisz
Copy link
Contributor

@flamisz flamisz commented Mar 30, 2021

Description:

fixes #17301

Review

  • Functional review done
  • Potential edge cases thought about (behavior of the code with strange input, with strange internal state or possible interactions with other Matomo subsystems)
  • Usability review done (is anything maybe unclear or think about anything that would cause people to reach out to support)
  • Security review done see checklist
  • Code review done
  • Tests were added if useful/possible
  • Reviewed for breaking changes
  • Developer changelog updated if needed
  • Documentation added if needed
  • Existing documentation updated if needed
@flamisz flamisz self-assigned this Mar 30, 2021
@flamisz flamisz added this to the 4.3.0 milestone Mar 30, 2021
@flamisz
Copy link
Contributor Author

@flamisz flamisz commented Mar 30, 2021

Hi @tsteur and @diosmosis,

this is very WIP, but I'd like to know if I go in the right direction.

A couple of questions I have:

  • should I check if the hash column exists or the updater will solve it anyway (I mean it can not happen that the code updated but DB not?)
  • when we create the update file? whenever we need a migration? so this PR should contain a new update file?
  • how we fill the hash column for the existing segments in the db? does that go into the update file?
  • @tsteur mentioned in the issue, ideally we can remove the urlencodes from this PR: https://github.com/matomo-org/matomo/pull/17029/files. can we @diosmosis?

sorry I have too many questions, I just'd like to understand more and do properly. thanks

@diosmosis
Copy link
Member

@diosmosis diosmosis commented Mar 30, 2021

should I check if the hash column exists or the updater will solve it anyway (I mean it can not happen that the code updated but DB not?)

It shouldn't happen that the code is updated but DB is not updated. If it does happen, then it's because of a bug in the update process, or because of user error. So you shouldn't have to check if the added column exists. (There are exceptions in this because sometimes during the one click update, some new code will execute after it is written to the hard disk, while the old database hasn't been updated, but this shouldn't be one of those cases.)

when we create the update file? whenever we need a migration? so this PR should contain a new update file?

Yes 👍 and updates are tied to a version, so first you'd bump the version in Version.php (setting to a new beta or rc based on what's in there already).

how we fill the hash column for the existing segments in the db? does that go into the update file?

Yes 👍 it should be initialized to urlencode($segment['definition']) in the update file.

@tsteur mentioned in the issue, ideally we can remove the urlencodes from this PR: https://github.com/matomo-org/matomo/pull/17029/files. can we @diosmosis?

I think it depends on how this is implemented. To get rid of those urlencodes, in the Segment constructor you'd have to search through every stored segment in the db for a definition that matches, then if found, use the hash in the database.

sorry I have too many questions, I just'd like to understand more and do properly. thanks

from what matt & thomas said, questions are to be encouraged 👍

@diosmosis
Copy link
Member

@diosmosis diosmosis commented Mar 30, 2021

I guess we don't have docs for writing updates/migrations, I'll put it on my list to try and write them (unless someone gets to them first).

@flamisz flamisz changed the title get segment hash store segment hash in DB Mar 30, 2021
@flamisz flamisz marked this pull request as ready for review Mar 31, 2021
@flamisz
Copy link
Contributor Author

@flamisz flamisz commented Mar 31, 2021

@tsteur and @diosmosis I'm wondering what else we need in this PR?

Can we get rid of those urlencode calls from the other PR? I checked that part of the code, and most of the time it gets the hash in the end of the process from the Segment, and now the getSegmentHash method does its best to find it (without anything, with urlencode and with urldecode as well).

$segmentTable = Common::prefixTable('segment');
$segments = Db::fetchAll('SELECT idsegment, definition from ' . $segmentTable);
foreach ($segments as $segment) {
$hash = md5(urlencode($segment['definition']));

This comment has been minimized.

@diosmosis

diosmosis Mar 31, 2021
Member

This should be md5($segment['definition']). Apologies, I know I said it needs to be urlencoded, I was referring to Segment construction. The Segment::getSegmentHash() function decodes the definition before md5-ing, so:

$segment = new Segment(urlencode($segment['definition']));
$hash = $segment->getHash();

is the same as

md5(urldecode(urlencode($segment['definition'])))

So if not going through Segment, calling md5 directly works (but only for definitions in the segment table).

This comment has been minimized.

@flamisz

flamisz Mar 31, 2021
Author Contributor

@diosmosis and @tsteur there is an issue with this one.

If we have a segment currently in a db like this actions%3E%3D2, current Matomo version will use the hash of the urlencoded version of it, so when we create the hash here:

$hash = md5($segment['definition']);

it won't be the same as we have in the archive_numeric tables.

One solution would be using $hash = md5(urldecode($segment['definition'])); in the update file.

We don't support double encoded segments in the db, do we? Only in URL/API?

This comment has been minimized.

@diosmosis

diosmosis Mar 31, 2021
Member

The segment condition value is encoded multiple times, not the segment condition operator.

The hash is computed overall with the fully urlencoded segment, because that is what is sent in the segment parameter. The stored segments are created by sending a definition= parameter, which does not have special treatment and gets urldecoded once automatically. So there is a discrepancy with what is in the stored segment table and what is sent as a query parameter (segment=urlencode(pageUrl==urlencode(urlencode(segmentValue))). The hash in the table has to match the hash computed w/ the fully encoded segment. The tests should cover this.

This comment has been minimized.

@diosmosis

diosmosis Apr 1, 2021
Member

The hash that gets put in the table needs to be the same hash that is created in this test: https://github.com/matomo-org/matomo/blob/4.x-dev/plugins/SegmentEditor/tests/System/ApiTest.php#L21 when run on 4.x-dev.

This comment has been minimized.

@flamisz

flamisz Apr 1, 2021
Author Contributor

thanks, @diosmosis, much more clear now.

core/Segment.php Show resolved Hide resolved
Copy link
Member

@tsteur tsteur left a comment

@flamisz looks like that should work 👍 left some minor comments

core/Segment.php Outdated Show resolved Hide resolved
core/Segment.php Outdated Show resolved Hide resolved
core/Segment.php Outdated Show resolved Hide resolved
@flamisz
Copy link
Contributor Author

@flamisz flamisz commented Apr 11, 2021

Hi @diosmosis, can you please review it again (hopefully last time) and we can merge it then?

@diosmosis
Copy link
Member

@diosmosis diosmosis commented Apr 11, 2021

When the tests pass, I'll take another look 👍

@flamisz flamisz force-pushed the 17301-store-segment-hash-in-db branch 2 times, most recently from a63912e to 2a1db98 Apr 11, 2021
@tsteur
Copy link
Member

@tsteur tsteur commented Apr 18, 2021

@flamisz @diosmosis was wondering if there's much left in this issue to do? Asking as we'd like to have this issue included in the next release

@diosmosis diosmosis merged commit 7b1b36c into 4.x-dev Apr 18, 2021
1 of 3 checks passed
1 of 3 checks passed
@github-actions
PHPCS
Details
@travis-ci
Travis CI - Branch Build Failed
Details
@travis-ci
Travis CI - Pull Request Build Failed
Details
@diosmosis diosmosis deleted the 17301-store-segment-hash-in-db branch Apr 18, 2021
@diosmosis
Copy link
Member

@diosmosis diosmosis commented Apr 18, 2021

@tsteur @flamisz merged it

@tsteur tsteur restored the 17301-store-segment-hash-in-db branch Apr 19, 2021
@tsteur tsteur deleted the 17301-store-segment-hash-in-db branch Apr 19, 2021
@tsteur tsteur restored the 17301-store-segment-hash-in-db branch Apr 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

3 participants

level de log mis à 1

Récupération du document https://github.com/matomo-org/matomo/pull/17408.

Récupération de l'URL https://github.com/matomo-org/matomo/pull/17408...

Raccourcis

Commandes

Fermer