Skip to content

Adobe Support Call Wednesday 28 November 2018

Who

  • Leo Berliant (Adobe)
  • John Bunker (Adobe)
  • Russ

What we did

  • Looked at log files on cmsa-dvlp-01
  • Discussed some of the workings of Blobs (see below)
  • Discussed some important configurations / processes to consider when / if we share the data store between author and publishers

References

Blob IDs

(We knew this / figured this out)

  • Blob IDs are file names.
  • With 6.1, blob ids are the SHA1 hash of the file content.
  • The first 3 bytes of the id are used for the directory path of the file

  • Example

[12:00:25] cms@cmsa-dvlp-01:datastore$ ls 00/01/09/
000109b4bb2908a86ab2fee517ba4d84e7fc4b2e
  • With 6.4, they did something different with the Blob IDs to add security so you could not guess the file name.

  • I (Russ) believe they used the password salting approach of adding some value (my guess is the reference.key) to the content in order to salt the hash, thus changing the file name. I ran this past Leo, but he neither confirmed nor denied.

  • IF reference.key is the salt, then the application must know this value in order to calculate the hash.

    • If may be sufficient to simply read the file from the top of the blob store directory.
  • Upgrades do not require a change to the files. (i.o.w. Russ speculates that AEM detects the hash version and falls back to the old method if old files are present).

  • Do newer versions of crx2oak produce the SHA-256 files, and older versions produce SHA-1?

NAS and I/O

We need to ensure our NAS has sufficient I/O performance to support out load.

  • We think AEM does not distinguish between "not found" and "not found quickly", at least not for normal levels of logging. This may lead it to think blobs do not exist, when the real problem is that the answer was not returned in time.

Sharing BlobStore between Author and Publisher(s)

  • This is a supported an recommened configuration, but care must be taken with respect to DataStoreGarbageCollection.
  • The biggest issue is that the various instances do now know about each other, so cannot know what blobs the other instances depend on.

  • Remove the automatic garbage collection jobs from all instances, and perform GC manually or with your own jobs.

  • Each instance should run DSGC in --mark-only mode, which should mark the items that node knows have been deleted
  • Run DSGC on a single node to remove the marked items.

Mongo blobs collection vs. nodes collection

  • Items of type jcr:data AND are greater than 4Kb are stored in the Blob Store (either on disk or in Mongo).
  • Items of type jcr:data AND less than 4Kb are stored in the Node Store.
  • All other item types are stored in Node Store.
  • Therefore, after extraction it is safe to remove the blobs collection from MongoDB.